Introduction to Shapefiles in R

Shapefiles are a popular method of representing geospatial vector data in GIS software. This file format was developed by ESRI (a notable GIS Software Company). The shapefile format can store data in the form of points, lines, or polygons. Shapefiles are used to represent adminsistrative boundaries (think country/state/county borders), roads, rivers, lakes, etc.

The raster package in R makes it very easy to download various administrative boundaries for any country. You just have to know the three letter ISO Code for the country of interest (you can look this up on Google). Below we demonstrate how to load the raster library, and get the province level administrative boundaries for France.

library(raster)                                  ##Load the Raster Library
france<-getData('GADM', country='FRA', level=1)  ##Get the Province Shapefile for France
plot(france)                                     ##Plot this shapefile

Note that level 1 detail give the boundaries one level below the national boundaries. Actually, level 0 gives only national boundaries, level 1 gives the equivalent of province level boundaries, and each susequent level gives finer and finer granularity. In the United States, level 0 give our national boundary, level 1 gives state boundaries, and level 2 adds county boundaries. You can see level 0 through level 3 for France in the following plot.

alt text

alt text

Below we will retrieve and plot the county level shapefile for the United States. Note that when we plot this, that the continental United States is extremely small since it also plots Alaska, Hawaii, and some other islands that belong to the US.

us<-getData('GADM', country='USA', level=2)  #Get the County Shapefile for the US
plot(us)
alt text

alt text

Now let’s explore this data a little further. If we check the class of the us object (by typing class(us)), we will see that it is a SpatialPolygonsDataFrame. This means that it is multiple polygons that are held together in a data frame. We can access each of these polygons in a similar way that we access elements of a normal data frame. Now let’s look at the names of this data frame:

names(us)
##  [1] "OBJECTID"  "ID_0"      "ISO"       "NAME_0"    "ID_1"     
##  [6] "NAME_1"    "ID_2"      "NAME_2"    "TYPE_2"    "ENGTYPE_2"
## [11] "NL_NAME_2" "VARNAME_2"

Notice there are various types of names in this data frame. Name 0 represents national boundaries, Name 1 represents state names, and Name 2 represents county names. We can see this further if we plot all of the unique NAME_1:

unique(us$NAME_1)
##  [1] "Alabama"              "Alaska"               "Arizona"             
##  [4] "Arkansas"             "California"           "Colorado"            
##  [7] "Connecticut"          "Delaware"             "District of Columbia"
## [10] "Florida"              "Georgia"              "Hawaii"              
## [13] "Idaho"                "Illinois"             "Indiana"             
## [16] "Iowa"                 "Kansas"               "Kentucky"            
## [19] "Louisiana"            "Maine"                "Maryland"            
## [22] "Massachusetts"        "Michigan"             "Minnesota"           
## [25] "Mississippi"          "Missouri"             "Montana"             
## [28] "Nebraska"             "Nevada"               "New Hampshire"       
## [31] "New Jersey"           "New Mexico"           "New York"            
## [34] "North Carolina"       "North Dakota"         "Ohio"                
## [37] "Oklahoma"             "Oregon"               "Pennsylvania"        
## [40] "Rhode Island"         "South Carolina"       "South Dakota"        
## [43] "Tennessee"            "Texas"                "Utah"                
## [46] "Vermont"              "Virginia"             "Washington"          
## [49] "West Virginia"        "Wisconsin"            "Wyoming"

Below we demonstrate how to subset a SpatialPolygonsDataFrame. Note here that if we wanted to only plot the Northwestern United States, we could use the following code to create a subset that only contains Washington, Oregon, and Idaho.

 northwest<-subset(us,NAME_1=="Washington" | NAME_1=="Oregon" | NAME_1=="Idaho")
 plot(northwest)