Loading...
Gadget by The Blog Doctor.

Thursday, April 22, 2010

Climate and charting with R

For many years I have been using Excel to analyse data and produce charts. Excel charts, though, are ugly.

Many of the Climate Science sites that I visit have very clear and attractive charts of temperature trends to illustrate the arguments presented in posts. Upon investigation I have discovered that many of these superior charts are produced in a statistical analysis program called R.

R is open source software and can be downloaded free from this site.

This post discribes:
* the five groups that produce temperature data
* downloading temperature data to a computer
* importing temperature data into R
* manipulating the data in R
* producing charts in R

There are five main groups of researchers that produce global temperature records. Three use temperatures measured at the surface: Giss, Hadley and NOAA; and two use satellites: UAH and RSS

The temperature data can be found at the following locations: Giss data, Hadley data, NOAA data, UAH data and RSS data.

These data sets require some manipulation before they can be imported into R. The spaces have to be removed and the data separated by commas. I remove the spaces in word and insert a tab character between the pieces of data, using the find ... replace function. I import the data into Excel and export it in comma delilited (CSV) format.

Here are the first two years of the Hadley data after this manipulation:
year,anom
1850.042,-0.691
1850.125,-0.357
1850.208,-0.816
1850.292,-0.586
1850.375,-0.385
1850.458,-0.311
1850.542,-0.237
1850.625,-0.34
1850.708,-0.51
1850.792,-0.504
1850.875,-0.259
1850.958,-0.318
1851.042,-0.345
1851.125,-0.394
1851.208,-0.503
1851.292,-0.48
1851.375,-0.391
1851.458,-0.264
1851.542,-0.279
1851.625,-0.175
1851.708,-0.211
1851.792,-0.123
1851.875,-0.141
1851.958,-0.151

The strange decimal numbers represent the months of each year and ensure that the months are equally spaced in a chart.

The surface temperature data go back to the 19th century. I produced the chart below from the Hadley data:



The red line is the monthly data. The black line is the line that best fits the data (regression line).

Here is the script that generates the chart above:

#################### Hadley temp average calc #################
## STEP 1: SETUP - Source File
par(las=1)
link <- "C:\\Learn_R\\Hadleydata\\Hadley_since_1850.csv"

## STEP 2: READ DATA
my_data <- read.table(link,
sep = ",", dec=".", skip = 0,
row.names = NULL, header = T,
colClasses = rep("numeric", 2),
na.strings = c("", "*", "-", -99.99,99.9, 999.9),
col.names = c("year", "temperature"))

## STEP 3: MANIPULATE DATA
Title <- "Hadley Temperature Chart and Linear Regression since 1850"

## STEP 4: Ceate Plot
# Plot temperature data
plot(temperature ~ year, data = my_data, type="l", col = "red", main = Title)
# Calculate regression
lm_fit <- lm(temperature ~ year, data = my_data)
# Display regression stats in R Console
summary(lm_fit)
# Plot regression line
abline(lm(temperature ~ year, data = my_data ))

############################################################

The par(las=1) command ensures that the Y-axis values are vertically aligned

The link <- "C:\\Learn_R\\Hadleydata\\Hadley_since_1850.csv" assignes file and path to the link vector (variable). I store the data files on C: drive in a sub_folder of the folder Learn_R.

Step 2 reads the data file into a variable (data frame) and performs some formatting.

Step 3: adds the title for the chart.

Step 4:
* Plots the data (with the plot command)
- temperature against year
- indicates the data to be plotted (data = my_data)
- sets the chart type to line (type="l")
- sets the colour of the line to red (col = "red")
- inserts the title from Step 3 (main = Title)
* Calculates the regression statistics
- with the command : lm_fit <- lm(temperature ~ year, data = my_data)
- writes the regression statistics to the R console (summary(lm_fit))
* Uses the regression information to plot a line of best fit
- with the command : abline(lm(temperature ~ year, data = my_data ))

Most of this script comes from D. Kelly O'Day's site at Climate Charts & Graphs. O'Day's site is a very useful introduction to the R language.

The link R Resources on the site provides links to a wide range of useful materials.

Here are some more charts that I plotted:









More charts can be seen at this link.

1 comment:

Kelly said...

Steve

Welcome to the R world. As a long time Excel user, I discovered R late in my data analysis career.

Glad to see you using my R scripts.

Your comment about manipulating the raw data in Excel:

"These data sets require some manipulation before they can be imported into R. The spaces have to be removed and the data separated by commas. I remove the spaces in word and insert a tab character between the pieces of data, using the find ... replace function. I import the data into Excel and export it in comma delimited (CSV) format."

I have R snippets to handle all the climate data situations I have run into so far and do not use Excel for any of my data manipulation.

R has several data read tools besides read.csv(). These include read.fwf() and read.table() as well as readLines().

I'd be glad to help you eliminate your Excel csv steps if you'd like any assistance.

D Kelly O'Day