Solar resource data – time series data vs monthly averages

Bad data in equals bad data out. This well-known phrase is very relevant in a context of technical design and energy simulation of photovoltaic (PV) power plants. Most solar companies understand this and are carefully evaluating uncertainty of solar resource data used for feasibility purposes.

Together with quality and accuracy, it is also important to look at the original time granularity of the input solar and weather data. In these regards, we have seen two main approaches when running PV simulations:

  • The use of artificial hourly data profiles, generated from monthly averages by synthetic generation.
  • The use of real hourly data time series, as an output of the weather data models.

Even though use of synthetically-generated, artificial hourly values has been a common practice, this approach is not recommended.

In this article, we discuss the benefits of using real hourly or sub-hourly time series instead of hourly data generated synthetically from monthly averages.

For this purpose, we have run a PV simulation using real data obtained from Solargis model in hourly resolution. We then compared the results with the same simulation using synthetically generated hourly data (derived from monthly averages of original hourly time series).


Minimum uncertainty, maximum P90

The main solar and weather parameters needed for PV energy output simulation are the incident solar radiation and the PV cell temperature (calculated from air temperature).

For a reliable energy simulation, it is important to have the correct match between solar radiation and temperature for each time step. Otherwise, power plant output and relevant energy losses can be over- or under-estimated.

This effect should be accounted as an additional factor to the uncertainty analysis when using synthetic data. In a sample simulation in the software PVsyst for a default 15 kWp system located in Southeast Spain (Plataforma Solar de Almería, sample files available here), we see a difference of 1.2% in annual energy output when using real hourly data compared to the use of synthetically generated hourly data. A similar exercise for other locations showed higher differences but mostly within ±2%, which can be considered as a reasonable estimate of additional uncertainty when using synthetically generated data. For sites with more complex patterns of solar radiation and temperature the additional uncertainty is expected to be even higher.

Once all uncertainties are accounted (more information about how to calculate P90 in a previous article), we can compare the results of PV energy simulation for our sample site in Table 1, below.

ts vs monthly 1

Table 1: Energy output obtained from the simulation of a simple fixed-mounted system

 (1) Use of pre-calculated long-term averages from pvPlanner could have a slightly higher uncertainty of approx. 1%

(2) Inter-annual variability taken from PVsyst

(3) Uncertainty factor from using synthetically generated profiles is included


In above example, even though the P50 value is higher (and looking more “appealing” at first sight) when using synthetic hourly data, the fact of having a higher uncertainty results in a lower P90 value (and less “appealing” in the end). This situation is illustrated in the probability chart below. In any case, for sake of project success, it is recommended to always use approach with lowest uncertainty.

ts vs monthly 2

Fig. 1. P50 and P90 energy values from two different simulations with different uncertainties


Accurate estimate of interannual variability

For estimation of total uncertainty of the annual energy estimates we need to know the annual variability of weather conditions and expected energy output. This can be estimated accurately if we know the annual sums for at least a 10-year period, from recent history. In absence of multiple year time series, we can only guess the expected annual variability of annual energy production, which is very difficult, as year-by-year variability has very complex geographical patterns.

In absence of time series data, a substitute approach is to make use of GHI annual variability values from sites with similar irradiation levels and climate conditions. For example, this approach is used as default when calculating P90 in PVsyst software (see There are two issues with this approach. First, the annual variability can vary significantly at different locations that have similar climate conditions and solar radiation. Second, as explained in a previous blog article, annual variability of GHI does not equal annual variability of PV output.

ts vs monthly 3

Table 2: Interannual variability (standard deviation) for a sample location in Almería, Spain


Understand range and extremes in weather data for optimum PV design

Even though monthly sums of both real and synthetic data are the same, the hour-by-hour analysis will always show critical differences. This means that in a synthetically-generated hourly time series the typical and extreme values are not fully captured and the synthetic data typically show systematic deviations. Selection of optimum design and components without making higher risk assumptions on expected weather conditions becomes an impossible task when using synthetic data.

For the sample site used in this article, such differences can be directly observed when comparing both datasets, real hourly TMY and synthetically generated year. There are differences between the maximum values registered in the data, but also in the statistical occurrence of hourly values above or below a certain threshold. This can be easily observed for all parameters used as inputs in the simulations; in our case Global Horizontal Irradiation (GHI), Diffuse Horizontal Irradiation (DIF), Air temperature (TEMP) and Wind Speed (WS). Related statistics for the sample site are shown in the Table 3.

ts vs monthly 4

Table 3: Average values of main weather parameters for PV simulation for the sample location.

ts vs monthly 5

Table 4: Weather statistics showing extreme values, calculated for real and synthetic data for the sample site in Almería, Spain.


For even more complete results, it is advised to use a complete set of historical time series (provided with TMY datasets in Solargis).


Accurate revenue planning and self-consumption analysis

The hourly distribution is most typically not well represented in the synthetic data and this can have a significant impact in the analysis of expected revenues. Not only because of the difference in pricing for the generated energy, which may be according to hourly prices set either by the spot markets or by an eventual power purchase agreement (PPA), but also for situations when the PV system is designed for self-consumption. In such cases, having an accurate long-term prediction of generation profiles is also needed in order to design a plan that maximizes the energy that is produced the PV system.

Figure 2 and Tables 5 and 6 below represent the differences per hour of the day when comparing simulation based on time series with real and synthetic hourly data. At the annual level, the highest overestimation, seen in the synthetic data, occurs in the afternoon. From the table of monthly deviations, we can also see that noticeable differences during the first and last hours of the day are significant, especially during certain months.

ts vs monthly 6

Figure 2: Aggregated annual value of the difference between synthetic and real hourly data, for each hour

ts vs monthly 7

Table 5: Aggregated monthly values of the difference between synthetic and real hourly data, by hour of the day.

ts vs monthly 8

Table 6: Aggregated monthly values of the difference between synthetic and real hourly data, by hour of the day, expressed in percentage


Validate (and accuracy-enhance) the model data using high quality ground measurements

The performance of satellite-based solar resource models for a given site can be characterized by a set of indicators. Bias or Mean Bias Deviation (MBD) characterizes systematic model deviation at a given site, i.e. systematic over- or underestimation; Root Mean Square Deviation (RMSD) and Mean Absolute Deviation (MAD) are used for indicating the spread of error for instantaneous values.

Having good quality ground measurements from the project site provides a unique opportunity for calculating such statistics and validating the weather data model (Table 7 and Figure 3). However, only if we have a coincident period in hourly granularity of real values we can run a proper comparison and identify possible deviations of the model. Synthetically generated data or monthly averages are not suitable for validation of solar resource model, as they do not represent real measurements.

ts vs monthly 9

Table 7: Validation statistics for the sample site in Almeria, Spain.

ts vs monthly 10

Figure 3: Dispersion representing satellite-based modelled data versus ground-measured data for hourly, daily and monthly values for the sample site.


Validating the quality of weather data used in the simulation will provide confidence to the results. But if a sufficient period is recorded (at least 12 months), it can be also used for model site-adaptation and data correlation. The site-adapted models are then used for recalculating the dataset at higher accuracy and lower uncertainty.


Calculate consistent metrics along the entire project lifetime

Weather data models are able to provide real hourly data for historical time, but the models also offer the possibility of real time data calculation across the whole PV plant life-time (see Figure 5). This means that the same source of data used for site prospection, planning and design of the power plant, can be − at a later stage − used also during the power plant operation. This allows for a consistent comparison between the actual performance assessment and the one initially calculated during the project development. It becomes helpful when the plant is a subject of re-evaluation, allowing more transparent transactions and solid evaluation of investment related to the solar PV asset.

ts vs monthly 11

Figure 5: Real model data can be used consistently across all evaluation activities during the PV plant lifetime



Use of synthetically generated data from monthly averages is an outdated approach, as it allows for a very limited range of analysis and the results obtained from the simulation are of higher uncertainty. As a result, use of synthetically generated data can lead to incorrect decisions. Using real hourly data from a reliable source is preferred in all cases: to achieve the most accurate results and to run a complete analysis and optimization of the PV power plant. The use of monthly averages is useful for site prospection and very basic studies at preliminary stages of the project.

ts vs monthly 12

Table8: Summary of the capabilities of real hourly data versus synthetic hourly data.


Further reading

Subscribe to our Blog

Subscribe to our email newsletter for useful tips and valuable resources.