Note: I’ve edited this post because of some incredibly stupid coding error. Thanks to Lucia for pointing out my error.
When I previously compared weather noise in the models to real life weather noise I was inspired to do so by one of Lucia’s posts. But I recently began thinking that taking the residuals from a trend line to get standard errors might not be the best way to go about analyzing the problem. The element of weather in the models and reality is the chaotic, seemingly random fluctuations one sees over small time scales. By smoothing the time series out on a long enough span, these fluctuations are removed and the overall behavior of the data begins to clearly emerge. Using this smoothed series, we now find the residuals. Figure 1 was generated by using the smooth function with the option rloess. The span was 5*12 = 60 months or five years. Figure 2 shows the residuals.
Figure 1
Figure 2
Now I computed residuals from all three surface temperature records (STR) individually and calculated their standard errors. I ran the same calculations for all the ensemble members of Echo G 20c3m. For the STR, I found 0.003096 °C, 0.002847 °C and 0.002700 °C for GISS, HadCrut and NCDC, respectively. For the ensemble, I found 0.04110 °C, 0.04075 °C, 0.04127 °C, 0.04100 °C and 0.04099 °C. Just by looking at the numbers it’s clear a hypothesis test is silly, but let’s just do it for kicks. I used ttest2 to test the following hypothesis:
Ho: Standard Error(Earth) = Standard Error(Model)
Ha: Standard Error(Earth) ~= Standard Error(Model)
To be more precise, ttest2 compared the means of the standard errors in the surface temperature record to the standard errors in all five models. Testing at 5% significance, we get a p-value of 8.351e-010. Thus the null hypothesis is rejected. Echo G is producing noise that is about 14 times what we see in the surface temperature record!


Chad–
What question are you testing with your t-test? I reads like you are testing whether one run of EchoG gives the same mean as another run of EchoG. Is that what you are testing?
Comment by lucia — August 10, 2008 @ 9:21 pm
I should have been more clear. The null hypothesis is that the mean in the residuals is the same as the mean in the models.
Comment by cshme — August 10, 2008 @ 9:40 pm
Chad…
Sorry to be dense:
The mean in the residuals should be zero– by definition. That’s just based on the way residuals are created — by subtracting some mean.
When you say “the mean in the residuals” do you mean: The variance (or standard error) in the residuals for the observations?
When you say the “mean in the models”, do you mean the standard error in the models (before or after smoothing?)
Comment by lucia — August 10, 2008 @ 10:28 pm
“The mean in the residuals” is just the mean of the residuals. Not variance. This is what happens when you make one tiny error in copy/pasting some variable name and end up mistaking residual for sigma and don’t even realize it till someone calls you out. I have learned an important lesson today. Just because your code works doesn’t mean you know what you’re doing!
I’ll fix the code to use standard errors like I did in my previous analysis. Thanks again.
Comment by cshme — August 10, 2008 @ 10:49 pm
Heh!
The perils of blogging are that mistakes are in the open. The advantage is people can ask, then you can fix.
In my opinion, the question is– can you get the right answer eventually? Each of us sitting at home talking to no one can’t. But we are still all curious, and this method lets us all talk.
I’m going at this more slowly that you because I want to see what all ‘features’ all this rebaselining introduces into the data.
Comment by lucia — August 11, 2008 @ 4:22 pm
Can you test the residuals in this way?
They aren’t independent variables if the temperature record has some memory to it.
Comment by Nick — August 11, 2008 @ 5:12 pm
Lucia- I think from now on, I’ll take my time calculating, writing, etc instead of trying to churn out a quick post.
Nick- I just constructed an autocorrelation plot for the residuals in the EchoG ensemble and it shows that all the values of rho are within the 95% confidence interval, thus they aren’t significantly different from zero. The largest value I get is 0.0252 at lag 1. I used the Durbin-Watson statistic as a check. I found DW=1.913. Now just about all the tables for dw-critical values stop at number of observations = 100 and I have 1200! Since dw is less than the critical value, then I would conclude that there is no autocorrelation present. But it would be a good idea if I had a larger table to consult.
edit: I forgot to baseline EchoG so I was going with the raw data. Disregard the numbers I stated. But that’s the methodology that I would use. I’ll go over this for a few models in my next post.
Comment by cshme — August 11, 2008 @ 6:41 pm