Weather Noise in the Climate Models
Lucia recently looked at weather noise in the three ensemble members of the Echo G climate model running A1B SRES and found it to be about twice as large as what is shown in the surface temperature record (STR). I decided to pursue this matter using a wider range of models and all their ensembles. First, for each model, I calculated the temperature anomaly relative to 1951-1980 and then averaged the results to get one time series for each model. The models run scenario 20c3m, reproducing the climate for the 20th century. Let’s take a quick look at how they compare to the STR. Figure 1 shows the decadal averages for both model results and the STR. For this analysis, we’ll use HadCrut3, NCDC and everyone’s favorite, GISS. I re-baselined HadCrut3 and NCDC, added them to GISS and took the average.
Figure 1
The time interval for the STR and model ensembles was chosen to be 1900-2000. I chose such a long period because it is often criticized that 7 years and a few months worth of data can’t be used to find much useful data on long term temperature trends. If that’s the case, then our calculations may not be very useful to determine how well the models hold up to reality. I computed the trendline correcting for autocorrelation (yes, I finally got that script to work properly) and compared the standard errors to that found in the STR. I computed the standard errors for GISS, NCDC and HadCru3 individually and found them to be 0.00445 °C, 0.00467 °C, and 0.00426 °C, respectively. For each Echo G ensemble member (0-4), the standard errors were found to be 0.00557 °C, 0.00521 °C, 0.00586 °C, 0.00510 °C, and 0.00494 °C. With these data, I performed the following hypothesis test:
Ho: Standard Errors in STR = Standard Errors in Models
Ha: Standard Errors in STR ~= Standard Errors in Echo G
Using ttest2 in Matlab with a significance level of 5%, we get a p-value of 0.0104. Thus we reject the null hypothesis. So the models aren’t hitting the nail on the head in terms of reproducing weather noise, but they are very close but not close enough. I ran a script to calculate the standard errors for all the models and the STR as a function of years before 2000. This generated the graph in figure 2.
Figure 2
GISS ER/GISS EH/CGCM 2.3.2a/Echo G all peak at around the same point (10 years before present). The rest of the models + the STR don’t show similar behavior. But they all very loosely converge as we go farther back. Now let’s turn to the temperature trend.
Figure 3
Most of the models produce very different temperature trends until about 20 years before 2000 when they begin to converge with the STR. If I can’t sleep tonight, maybe I’ll see how hypothesis tests bare out over the long term for all the models as well.
Update (8/3): I completed the hypothesis tests for the standard errors in all the models. The same null hypothesis as the one above is used. Below is a table summarizing the results. For each model, the first column is the outcome of the test. 0 indicates do not reject, 1 indicates reject. The second column is the p-value.
A quick glance reveals ECHO G, ECHAM5 and GISS AOM suck. GISS ER, BCC CM1 and CGCM 3.1aren’t that impressive. But CGCM 2.3.2a and GISS EH performed very well.
Update (8/4): I’ve completed the hypothesis testing for the slopes. Again, we are using the same null hypothesis as above, but for slopes of course.
It looks like ECHO G and ECHAM 5 performed perfectly. GISS ER and GISS AOM are close behind. GIS AOM, CGCM 3.1, CGCM 2.3.2a and BCC CM1 performed very poorly.





[...] with p-values ranging from 15-56%. I’m not going to compare all the models I did in a previous post. First, I’d like people to chime in on this method and let me know if they see anything [...]
Pingback by Model Noise Revisited « Scientific Prospective — August 10, 2008 @ 8:30 pm
[...] 3:53 pm This post will deal with a different method of calculating weather noise than was used in this previous post. Responding to comments in previous posts and reviewing what I’ve written in [...]
Pingback by Model Noise Revisited - cont’d « Scientific Prospective — August 13, 2008 @ 3:54 pm
[...] data and compare it to a trend line based on 30 years of model data. I took up the issue in my third post but I found that my use of ttest2 was flawed. Well, I modified the function to perform a [...]
Pingback by Model Data and Falsification « Scientific Prospective — August 25, 2008 @ 5:28 am