New generation Solargis Evaluate: data, PV design & simulation, analysis, and reports in one cloud-based solution. Discover more ->

One of the most critical outputs from PV simulations is the P50 annual energy yield estimate. Often referred to as the "best estimate," the P50 value represents the annual energy yield that has a 50% probability of being exceeded (with an equal 50% chance that the actual yield will fall below it).

However, relying solely on the P50 value may be too optimistic for project stakeholders. To address this, additional probability-based yield estimates are commonly used e.g. P90 value, which indicates the energy yield expected to be exceeded 90% of the time. This helps provide a broader picture of expected project returns.

In this article, we will explain how to calculate these probabilistic energy yield scenarios and how different ways of calculation impact the final result.

Normal probability distribution

Since representative datasets spanning very long time periods are often unavailable, it is common practice to assume that annual PV yield follows a normal (or Gaussian) distribution for simplification.

In a normal distribution, the center of the curve is represented by the mean, while the width of the curve is defined by the standard deviation (σ or stdev), which quantifies the uncertainty or variability in energy yield estimates.

Let’s now situate the most common probability scenarios within the same normal curve:

  • The P50 value represents the mean of the distribution. It is the most probable single value and divides the probability distribution equally. There is a 50% chance the actual yield will exceed this value, and a 50% chance that the actual yield will fall below it.
  • The P75 value, which lies between P50 and P90, represents the yield expected to be exceeded in 75% of cases.
  • The P90 value lies to the left of the mean and represents a more conservative estimate. It corresponds to the yield that is expected to be exceeded in 90% of the cases, with a 10% chance of falling below this value. P90 is perhaps the most commonly used scenario in PV yield assessments.
  • The P99 value lies further to the left, representing the yield that is exceeded in 99% of cases.

These values can be generalized as Pxx estimates, where “xx” indicates the probability of exceedance as a percentage. By positioning these estimates within a normal distribution curve, one can visualize the range of likely outcomes and better understand the risk associated with each.

pxx1pxx2pxx12

Fig. 1: P50, P75, P90, and P99 values represented in a normal distribution.

Although they may appear related, Pxx scenarios in PV yield estimates should not be confused with statistical percentiles. In statistics, the p90 or 90th percentile is the value below which 90% of a dataset falls. In contrast, a P90 estimate in PV yield assessments represents a yield level that is expected to be exceeded with 90% probability.

Confidence intervals

One of the key advantages of assuming a normal distribution is the ability to easily derive one Pxx scenario from another using straightforward mathematical relationships.

In all normal distributions, one standard deviation from the mean captures approximately 68.3% of the distribution. This statistical property allows us to calculate exceedance probabilities, such as P90, using simple multipliers.

For example, the P90 yield can be estimated by subtracting 1.282 times the standard deviation from the P50 value. This multiplier corresponds to a 90% probability of exceedance. 

 

pxx3

Fig. 2: Uncertainty intervals, expressed at standard deviation and 80% confidence levels (P90 exceedance)

Tab. 1: Calculation of different Pxx from a normal distribution of probability.

 

Probability of occurrence

Formula

One standard deviation

68.3%

± STDEV

Two standard deviations

95.5%

± 2*STDEV

Three standard deviations

99.7%

± 3*STDEV

P75 uncertainty

50%

± 0.675*STDEV

P90 uncertainty

80%

± 1.282*STDEV

P95 uncertainty

90%

± 1.645*STDEV

P97.5 uncertainty

95%

± 1.960*STDEV

P99 uncertainty

98%

± 2.326*STDEV

In general, any Pxx value—such as P75 or P99—can be calculated from the P50 estimate using the appropriate multiplier from the Gaussian (normal) distribution.

Tab. 2: Calculation of different Pxx exceedance values for a normal distribution of probability.

 

Probability of exceedance

Probability of non-excedance

Formula

P50 value

50%

50%

Mean

P75 value

75%

25%

Mean - 0.675*STDEV

P90 value

90%

10%

Mean - 1.282*STDEV

P95 value

95%

5%

Mean - 1.645*STDEV

P97.5 value

97.5%

2.5%

Mean - 1.960*STDEV

P99 value

99%

1%

Mean - 2.326*STDEV

Aggregation of uncertainty and variability factors

The total uncertainty associated with a yield estimate should account for all relevant contributing factors, each expressed at the same probability of exceedance. This requires aggregating various sources of uncertainty into a single combined value and understanding how these sources interact and influence one another.

In addition to assuming that PV yield follows a normal (Gaussian) distribution, it is typically assumed that the individual sources of uncertainty are independent. This enables the use of the root-sum-square (RSS) method to calculate the total uncertainty, as follows:

$$U_{\text{total}}= \sqrt{U_1^2 + U_2^2 + \ldots + U_n^2}$$

In the case of PV yield assessments, there are three main factors or components we take into account when calculating the total uncertainty of the annual PV yield:

  • Uncertainty of solar irradiance models. Solar irradiance model uncertainty typically refers to annual GHI (Global Horizontal Irradiance) estimates. Model uncertainty typically includes limitations  in satellite-based data (this applies to satellite and validation discrepancies with ground measurements.
  • Uncertainty of PV simulation. Uncertainty also arises from the PV simulation models themselves, which estimate energy yield. Common contributing factors include modeling assumptions and limitations of user-inputs of different nature.
  • Interannual variability. Year-to-year variability of weather conditions (related with solar radiation but also with temperature and other meteorological parameters) introduces natural fluctuations in energy production. This interannual variability is quantified as the standard deviation of annual values in a multi-year time series. To characterize this properly at least 10 years of high-quality historical data is recommended.


All these contributing factors are combined in a total uncertainty Utotal in a quadratic sum following the previously mentioned root-sum-square (RSS) method:

$$U_{\text{total}} = \sqrt{U_{\text{model}}^2 + U_{\text{simulation}}^2 + U_{\text{interannual}}^2}$$

Type of input datasets

In solar energy simulations, it is common to use different types of datasets to estimate expected yield. The choice can be limited to the software that is being used for the output simulation.

Below is an overview of the most commonly used dataset types, along with descriptions and sample data formats.

Time Series

The historical time series dataset includes all available data from the earliest year (1994, 1999, or 2007, depending on the location) up to the present.

Advantages:

  • Most comprehensive representation of weather patterns

Limitations:

  • Larger file sizes and longer computation times
  • Limited support in old PV simulation tools

Best Use:  Ideal for high-fidelity energy yield simulations and long-term variability assessments. Provides the most realistic input for Pxx scenario modeling.

Download Sample 15-minute Time Series Data (ZIP, 27.3 MB)

TMY P50

The TMY P50 dataset represents the typical climate conditions for each month, selected from the historical dataset. These "typical" months are joined together to form a synthetic year. The dataset is designed to represent a “standard” or most probable year.

Advantages:

  • Small file size and faster simulation speed
  • Common format supported by most PV simulation software, including old ones

Limitations:

  • Compresses the full time series into a simplified file
  • Excludes extreme or atypical weather events

Best Use: Widely used for baseline (P50) simulations in early-stage project development.

Download Sample 60-minute TMY P50 Data (ZIP, 0.3 MB)

TMY P90

The TMY P90 dataset represents a conservative year with below-average solar resource conditions. It is built by recombination of monthly data from different years to ensure that the annual GHI is close to the P90 value. This dataset is suitable for simulating conservative yield scenarios.

Advantages:

  • Small file size and faster simulation speed
  • Common format supported by most PV simulation software, including old ones
  • Practical for conservative energy yield estimates

Limitations:

  • Include atypical weather patterns that may not represent long-term conditions

Best Use: Simulations in early-stage project development that require conservative projections.

Note: In the sample calculation included in this article, we use hourly Typical Meteorological Year (TMY) datasets derived from original time series data. This involves generating a single year of hourly data (8,760 values) from more than 1 million data points.

This should not be confused with simplified methods that also generate a year of hourly data, but using synthetic generators based on monthly long-term averages (LTA), represented by only 12 values.

Synthetic TMY approaches are less accurate, as they introduce artificial variability that does not reflect real conditions, potentially leading to errors of up to 10%. Additionally, they often lack internal consistency between key variables such as GHI and temperature, which can result in unrealistic scenarios (e.g., low GHI paired with high temperatures), ultimately causing underestimation of PV output.

You can learn more about this in this other article.

Calculation of interannual variability

The variability of energy yield (the third component in the overall uncertainty calculation) is computed using the following formula:

$$\text{var}_n = \frac{\text{stdev}}{\sqrt{n}}​$$

Where:

  • stdev is the standard deviation of annual energy yield time series
  • n is the number of years over which the variability is assessed

Since PV yield estimates are typically expressed as annual values, in this article we are always using n = 1, which reflects the variability expected in any single year. This is the standard assumption for P90 energy yield calculations.
For longer-term projections, such as over 10, 20, or 25 years, the value of n increases accordingly (e.g., n = 10, 20, 25). In these cases, the overall uncertainty decreases, because the effects of interannual variability become less significant when averaged over longer periods.

The interannual variability value used for the P90 estimate is typically derived from the annual energy yield (PVOUT) based on historical time series data. However, when TMY datasets are used in simulations, interannual variability is often simplified and instead based on historical annual global horizontal irradiance (GHI).

While PVOUT and GHI generally follow similar trends, related interannual variability calculations do not produce identical results. This is because GHI does not capture various dynamic factors involved in converting irradiance into electricity, such as system performance, temperature effects, and other losses.

 

GHI interannual

PVOUT interannual

Fig. 3. Representation of annual time series sums for GHI and PVOUT. Source: Evaluate 2.0

 

Sample Calculation

Project Details

  • Installed capacity: 1 kWp
  • Technology: Crystalline silicon (c-Si)
  • Inverter efficiency: 97.5%
  • DC losses: 2.5%
  • AC losses: 1.5%
  • Relative row spacing: 2.5
  • Degradation: Not considered (first-year production only)
  • Location: Plataforma Solar de Almería (37.094416,-2.35985)

Input Datasets

  • Time Series Dataset: subhourly satellite-based solar irradiance data spanning 1994–2024 with additional meteorological parameters required for the simulation.
  • TMY P50 Dataset: Created by selecting and concatenating typical months from the time series.
  • TMY P90 Dataset: Created by recombining representative monthly data from different years to represent a P90 scenario.

Uncertainty Parameters

  • Irradiance model uncertainty (GHI): ±3.5% (P90 confidence level)
  • PV simulation uncertainty: ±5% (P90 confidence level)
  • Interannual variability: Calculated from PVOUT annual values (when using time series dataset) o GHI annual values (when using TMY datasets).

Calculations

1. PVOUT P90 from Full Historical Time Series

  • Simulate PV output using the full historical time series.
  • Derive the P50 value as the average of annual PVOUT values.
  • Compute total uncertainty:
  • Uncertainty in annual GHI
  • PV simulation uncertainty
  • Interannual variability derived from PVOUT time series
  • Derive the P90 value by applying total uncertainty to the P50 value.

2. PVOUT P90 from TMY P50 Dataset

  • Simulate PV output using the TMY P50 dataset (single year).
  • Calculate P50 value as the annual total of PVOUT.
  • Compute total uncertainty:
  • Uncertainty in annual GHI
  • PV simulation uncertainty
  • Interannual variability based on the GHI time series used to create the TMY
  • Derive the P90 value by applying total uncertainty to the P50 result.


3. PVOUT P90 from TMY P90 Dataset

  • Simulate PV output using the TMY P90 dataset (single year).
  • The resulting annual PVOUT is already representative of P90 conditions.
  • Apply only PV simulation uncertainty to obtain the final P90 value.
Diagram calculation of P90

Fig. 4. Diagram showing calculation process of PVOUT P90 from Time Series, TMY P50 and TMY P90 datasets

Results

The findings are summarized in the table below:

  • Compared to the time series simulation (the most complete and accurate method), using TMY P50 led to a 1% overestimation of the P90 energy yield.
  • The TMY P90 approach resulted in a 4% underestimation of the P90 energy yield.

 Tab. 3. Summary of results from a sample calculation for a project located in Almería, Spain. Please note that these outcomes are specific to the site's characteristics and system layout, and trends may vary for projects in different locations or with different configurations.

   

Simulation with TS

Simulation with TMY P50

Simulation with TMY P90

P50 annual value

1705 kWh/kWp

1716 kWh/ kWp

1606 kWh/ kWp

Uncertainty factors

Solar radiation model

±3.5%

±3.5%

included in TMY P90

Energy simulation model

±5%

±5%

±5%

Interannual variability

±3.2%

±2.6%

included in TMY P90

Total uncertainty

±6.9%

±6.6%

±5%

P90 anual value

1588 kWh/kWp

1602 kWh/ kWp

1526 kWh/ kWp

Conclusions

Accurately estimating expected irradiance under conservative scenarios (e.g., P75, P90, P99, etc) is essential for the successful development and financing of photovoltaic (PV) projects.

Reliable Pxx energy yield assessments must account for several key sources of uncertainty, including the quality of the solar irradiance data, the accuracy of PV simulation models, and the interannual climate variability specific to the project site.

Although it can be expressed using different confidence intervals, uncertainty is a fixed value specific to a given PV project determined by considering various contributing factors. Any change in the level of uncertainty reflects a change in one or more of these underlying factors.

The choice of dataset also has a critical impact on results. Even when based on the same underlying time series, using typical meteorological year (TMY) datasets can introduce notable deviations in P90 yield estimates. These differences arise from the information loss inherent in generating TMY datasets and also from the way uncertainty is treated in the calculation.

Further reading

Keep reading

Why to use satellite-based solar resource data in PV performance assessment
Best practices

Why to use satellite-based solar resource data in PV performance assessment

It is widely accepted that high-standard pyranometers operated under rigorously controlled conditions are to be used for bankable performance assessment of photovoltaic (PV) power systems.

How Solargis is improving accuracy of solar power forecasts
Best practices

How Solargis is improving accuracy of solar power forecasts

Just as there are horses for courses, different forecasting techniques are more suitable depending on the intended forecast lead time.

Being certain about solar radiation uncertainty
Best practices

Being certain about solar radiation uncertainty

In the context of PV yield simulation, uncertainty helps users understand the potential deviations in the results produced by the software they are using. Understanding these deviations plays a key role in selecting the optimal design of a power plant and in evaluating financial risks and return on investment.