Documentation for Climate Outcome Likelihood Tool

Skip to observed data option
Skip to random daily samples option
Skip to analog data option

General information

This tool is used to determine, based on historic data, the spread of precipitation possible over a specified amount of time. More specifically, the tool provides the likelihood of recovering a deficit + normal or reaching a certain precipitation threshold by some future date.

This tool uses data from the from the Global Historic Climatology Network (GHCN) accessed through the Applied Climate Information System (ACIS). We suggest that stations with a record 1920 or earlier through the present are used for two reasons: 1) a longer record provides more data points and thus more meaningful results in the probability distribution; and 2) we anticipate this tool will be used to make decisions related to drought. Utilizing a record 1920-present includes several of California's major droughts and thus the distribution created will reflect these droughts.

Dates provided must be sequential and the start date (From) must be within 5 years prior to today and the end date (ending) must be within 5 years following today. This range is set based on timely performance via the web. The middle date (To) is by default set to "today" but can vary based on user needs. However, only recovery periods that have fewer missing days than specified are included in the analysis. If the middle date is set to a date prior to "today" – say today is April 4, 2015, and it is set to Jan 1, 2015 – and the end date is set to a date in the future, say end of the current water year, Sept 30, 2015, and missing days is set to 5, the Jan 1, 2015-Sept 30, 2015, period will not be featured in probability distribution because there are too many missing days, even though some observations have been made in the period.

Analyses were performed using Python and graphed using High Charts Javascript-based software. The Django framework was used to interface between the Python analyses on the server side and Javascript on the client side. Data are acquired via calls to ACIS web services, abbreviated here as ACIS-ws.


Observed data option

  1. Make ACIS-ws call to retrieve precipitation sum between user selected start date (From) and middle date (To) less 1 day since precipitation has not typically been observed/entered system for today. Also retrieve 1981-2010 normal precipitation for this period based on a sum of all daily normal values. Subtract observed value from normal value to determine if a surplus or deficit is present – a surplus will be a negative value and a deficit positive. Metadata (station name, state, period of record) are also gathered at this time.
  2. Retrieve precipitation sums via ACIS-ws call for all periods in the station record that are equivalent to the "recovery period", the period between dates in form labeled "to" and "ending" that have a number of missing days less than that specified by the user. For example, if the user specified recovery period was 2015-04-05 to 2015-09-30 with 5 missing days, sums for all periods April 10-Sept 30 apparent in the station record that have less than 5 missing days would be retrieved. The number of such periods present in the record will determine the number of records used in the analysis. An array of these "recovery period" sums is created. The 1981-2010 normal for this period is also retrieved.
  3. Calculate deciles (10th to 100th percentile by multiples of 10) for the array of "recovery period" sums. This is done using Python's numpy.percentile function to calculate each decile.
  4. Create a normalized histogram of the "recovery period" sums. Determine the edge values for 1" bins based on the minimum and maximum values present in the recovery period array. Use numpy.histogram function to place array values in bins and normalize the histogram (density=True) such that the sum of the area of the bars is equal to 1 and each bar height represents the probability of receiving a precipitation amount within that bin.
  5. Prepare data to be graphed as cumulative distribution function: Sort array of "recovery period" sums from lowest to highest. Calculate the proportional values of the samples:

          p = 1. * arange(len(data)) / (len(data) - 1)

    Plot the sorted data.
  6. Determine likelihood of a particular outcome:
    1. If user has selected "amelioration": Take the surplus/deficit value calculated in step 1 and add it to the normal for the recovery period retrieved in step 2. This provides the amount needed to recover any deficit and reach normal by the end of the recovery period. Next, determine how many of the accumulations in the future period array are greater than or equal to the amount needed – those that would recover any deficit in observed period as well as normal for recovery period. To get likelihood of recovery as a percentage:

            (number of accumulations in record ≥ amount needed / total accumulations) * 100

      If there are no values in the record that will recover the deficit and normal for the recovery period, likelihood will be 0. If there is a large surplus of precipitation in the observed period and/or all of the accumulations in the recovery period would allow for recovering any deficit +normal, then likelihood of recovery will be 100%.
    2. If user has selected "custom threshold": Determine how many values in the recovery period array are greater than or equal to the proposed threshold. To get likelihood as a percentage:

           (number of accumulations in record ≥ threshold / total accumulations) * 100

      The opposite is performed to calculate the likelihood of not reaching a threshold.
  7. Display data on graph: Graph probability density function based on data produced in step 4. Include "amount needed" to recover deficit/reach normal as a vertical line if "amelioration" was chosen. If "custom threshold" was chosen, display the threshold as a vertical line. Graph cumulative distribution function based on data produced in step 5. On both the PDF (probability density function) and CDF (cumulative distribution function) graphs, display deciles calculated in step 3 as vertical grey lines on both graphs to help the user determine the precipitation amounts associated with various percentiles. Display 1981-2010 normal for the recovery period in red as well.

Random daily samples option

A word on daily samples: This method does not account for precipitation dependencies, i.e., if precipitation occurs on one day, there is a greater likelihood it will occur on the subsequent day. For this reason, we have found that the sampling method often displays a lower maximum than the observed method. It is an interesting exercise to compare the observed distribution with the sampled distribution to view their similarities and differences. We welcome any recommendation for an improved sampling method.

  1. Make ACIS-ws call to retrieve precipitation sum between user selected start date (From) and middle date (To) less 1 day since precipitation has not typically been observed/entered system for today. Also retrieve 1981-2010 normal precipitation for this period based on a sum of all daily normal values. Subtract observed value from normal value to determine if a surplus or deficit is present – a surplus will be a negative value and a deficit positive. Metadata (station name, state, period of record) are also gathered at this time.
  2. Retrieve daily precipitation values via ACIS-ws call for all periods in the station record that are equivalent to the "recovery period", the period between dates in form labeled "to" and "ending". Missing data are not accounted for here since they are taken care of inherently within the sampling. For example, if the user-specified recovery period was 2015-04-15 to 2015-09-30, daily values for all periods April 10-Sept 30 apparent in the station record would be retrieved. An array of the "recovery period" daily values is created for each period in the station record. An array of all years in the station record is created as well. The 1981-2010 normal for this period is also retrieved.
  3. Perform Sampling: For each day in the recovery period (in this example, 2015-04-05 through 2015-09-30), a random year is selected using the Python random module and the precipitation value for that particular day is added to a running sum. This is repeated until the last day of the recovery period is reached. Sampling for the example period might look like this:

          Sample 1: 1942-04-05 + 2008-04-06 + 1955-04-07 + 1997-04-08 + .... + 2010-09-30
          Sample 2: 2012-04-05 + 1978-04-06 + 1922-04-07 + 1946-04-08 + .... + 1969-09-30

    This process is repeated 1,000 times and yields an array of 1,000 sums representing these "synthetic periods". The number of samples was determined based on a value that could be run time through a web interface in less than 5 seconds. If a missing value is encountered while sampling, another year is chosen at random. This process is repeated up to 100 times. If a value cannot be found after 100 tries (for example, if there is no value for any April 6 in the station record), the process terminates and sampling cannot be conducted on the selected station.
  4. Calculate deciles (10th - 100th percentile by multiples of 10) for the array of sampled sums. This is done using Python's numpy.percentile function to calculate each decile.
  5. Create a normalized histogram of the sampled precipitation accumulations. Determine the edge values for 1" bins based on the minimum and maximum values present in the sample array. Use numpy.histogram function to place array values in bins and normalize the histogram (density=True) such that the sum of the area of the bars is equal to 1 and each bar height represents the probability of receiving a precipitation amount within that bin.
  6. Prepare data to be graphed as cumulative distribution function: Sort array of sampled sums from lowest to highest. Calculate the proportional values of the samples.

          p = 1. * arange(length(data)) / (length(data) - 1)

    Plot the sorted data.
  7. Determine likelihood of a particular outcome:
    1. If user has selected "amelioration": Take the surplus/deficit value calculated in step 1 and add it to the normal for the recovery period retrieved in step 2. This provides the amount needed to recover any deficit and reach normal by the end of the recovery period. Next, determine how many of the accumulation values in the recovery period sampled array are greater than or equal to the amount needed – those that would recover any deficit in observed period as well as normal for recovery period. To get likelihood of recovery as a percentage:

            (number of accumulations in record ≥ amount needed / total accumulations) * 100

      If there are no values in the record that will recover the deficit and normal for the recovery period, likelihood will be 0. If there is a large surplus of precipitation in the observed period and/or all of the accumulations in the recovery period would allow for recovering any deficit +normal, then likelihood of recovery will be 100%.
    2. If user has selected "custom threshold": Determine how many values in the sampled recovery period array are greater than or equal to the proposed threshold. To get likelihood as a percentage:

            (number of accumulations in record ≥ threshold / total accumulations) * 100

      The opposite is performed to calculate the likelihood of not reaching a threshold.
  8. Display data on graph: Graph probability density function based on data produced in step 6. Include "amount needed" to recover deficit/reach normal as a vertical line if "amelioration" was chosen. If "custom threshold" was chosen, display the threshold as a vertical line. Graph cumulative distribution function based on data produced in step 5. On both the PDF and CDF graphs, display deciles calculated in step 4 as vertical grey lines on both graphs to help the user determine the precipitation amounts associated with various percentiles. Display 1981-2010 normal for the recovery period in red as well.

Analog data option

  1. Make ACIS-ws call to retrieve precipitation sum between user selected start date (From) and middle date (To) less 1 day since precipitation has not typically been observed/entered system for today. Do this for all like periods in the station record as well. Also retrieve 1981-2010 normal precipitation for this period based on a sum of all daily normal values. Subtract observed value (most current period) from normal value to determine if a surplus or deficit is present – a surplus will be a negative value and a deficit positive. Metadata (station name, state, period of record) are also gathered at this time.
  2. Determine what decile range the observed value lies in. Using the array of precipitation values for the observed period, calculate deciles (10th - 100th percentile by multiples of 10). This is done using Python's numpy.percentile function to calculate each decile. In this example, we will say the "observed" period selected was 2014-10-01 through 2015-04-05 (recall that accumulation is taken for this period less 1 day). 7.4 inches of precipitation were observed 2014-10-01 through 2015-04-04; this lies in the 50th-60th percentile for the period.
  3. From the web form, determine what the user wants as analogs, ± 1, 2, or 3 deciles. For this example, we will say the user selected "± 1 decile". Since the observed value is in the 50th-60th percentile, we will grab all the 10-01 to 04-04 periods in the station's record that have accumulations that fall within that percentile range as well as those that the 40th-50th percentile (-1 decile) and the 60th-70th percentile (+1 decile). We will say, for the sake of brevity, the station being used has a short record (not recommended) and there are 2 values in the 40th-50th percentile (periods of 10-01 to 04-04 ending 2001, 2005), 2 values in the 50th-60th percentile (periods ending 2011, 2015), and 2 values in the 60th-70th percentile (periods ending 1998, 2004). Thus, the 10-01 to 04-04 periods ending in 2001, 2005, 2011, 1998, and 2004 are considered the "analogs" to the current accumulation value ending in 2015 by the definition of precipitation totals within ± 1 decile of the observed period. Periods with a number of missing days greater than the "missing" amount specified will not be given as analogs.
  4. Next, we look at the future "recovery" period selected by the user. In this example we will use 2015-04-05 through 2015-09-30. To build the probability distributions that will be graphed, we utilize the recovery period beginning in each analog year calculated for the "observed" period. Thus, the recovery periods would be 2001-04-05 to 2001-09-30, 2005-04-05 to 2005-09-30, 2011-04-05 to 2011-09-30, 1998-04-05 to 1998-09-30, and 2004-04-05 to 2004-09-30 in this short record station example. An ACIS-ws request is made for precipitation accumulation totals for each of these periods and the values are placed in an array. If any of these periods have a number of missing values exceeding the specified threshold for missing data, that period would not be included in the array for analog recovery periods. The normal for the "recovery" period is retrieved via ACIS-ws as well.
  5. Next, a normalized histogram of the "analog" accumulations is created. In the simple example given here, there are only five accumulations to graph so it is not going to make a great histogram. For this reason, we recommend doing all analyses, especially analogs, on stations with long station records, preferably 50-60 years or more. To create the histogram, determine the edge values for 1 inch bins based on the minimum and maximum values present in the recovery period array. Use numpy.histogram function to place array values in bins and normalize the histogram (density=True) such that the sum of the area of the bars is equal to 1 and each bar height represents the probability of receiving a precipitation amount within that bin.
  6. Prepare data to be graphed as cumulative distribution function: Sort array of analog sums from lowest to highest. Calculate the proportional values of the samples.

          p = 1. * arange(length(data)) / (length(data) - 1).

    Plot the sorted data.
  7. Determine likelihood of a particular outcome:
    1. If user has selected "amelioration": Take the surplus/deficit value calculated in step 1 and add it to the normal for the recovery period retrieved in step 4. This provides the amount needed to recover any deficit and reach normal by the end of the recovery period. Next, determine how many of the accumulation values in the recovery period analog array are greater than or equal to the amount needed – those that would recover any deficit in observed period as well as normal for recovery period. To get likelihood of recovery as a percentage:

            (number of accumulations in record ≥ amount needed / total accumulations) * 100

      If there are no values in the analog array that will recover the deficit and normal for the recovery period, likelihood will be 0. If there is a large surplus of precipitation in the observed period and/or all of the accumulations in the recovery period would allow for recovering any deficit +normal, then likelihood of recovery will be 100%.
    2. If user has selected "custom threshold": Determine how many values in the analog recovery period array are greater than or equal to the proposed threshold. To get likelihood as a percentage:

            (number of accumulations in record ≥ threshold / total accumulations) * 100

      The opposite is performed to calculate the likelihood of not reaching a threshold.
  8. Display data on graph: Graph probability density function based on data produced in step 5. Include "amount needed" to recover deficit/reach normal as a vertical line if "amelioration" was chosen. If "custom threshold" was chosen, display the threshold as a vertical line. Graph cumulative distribution function based on data produced in step 6. On both the PDF and CDF graphs, display 1981-2010 normal for the recovery period in red. Deciles are not calculated or shown for the analog option due to the small number of values graphed.