Introduction

R is an extremely powerful statistical scripting language. It is open-source and quickly gaining traction across academia, research organizations, and businesses. It is often the tool of choice for statisticians, data scientists, quantitative financial analysts, and a myriad of other professions. It is used for research at the vast majority of graduate schools. It is currently used by companies like Facebook, Google, the NY Times and Wallstreet financial organizations.

R is open-source and is freely available to download. You can put base R on any government computer. You can use base R as-is to write and run R script. That being said, RStudio has provided a very useful “front-end” for R that is generally easier to use (R is still the “engine”; you can’t run RStudio without R). We will primarily use RStudio in SE370. Some DoD organizations, however, will not allow you to install RStudio. Remember though, you can still run everything just like we teach you from base R.

In SE370 we have dedicated four lessons to learning how to conduct geospatial analysis in R. Spatial analysis and visualization is increasingly being used in industry, the military, other government organizations, and academia. R is powerful tool in this world of geospatial analysis and visualizaion and has hundreds of packages dedicated to these types of tasks (as seen here). In SE370 we will teach you how to accomplish the following geospatial tasks in R:

  1. Geocode locations
  2. Read in and understand spatial points data
  3. Read in and understand shape files
  4. Plot points on map (normal and with point density)
  5. Plot contour heat map
  6. Plot pie graph over polygons

If you need to conduct a quick review of R, I recommend that you look at two introductory lessons that we’ve developed for SE350:

Before you begin conducting spatial analysis in R, make sure that you install two necessary packages: ggmap, Rcpp, and sp. This can be done in the packages tab in the lower right quadrant of RStudio. It can also be done on the command line by typing install.packages('ggmap') , install.packages('Rcpp'), and install.packages('sp')

Point Data

Point data is the simplest type of geospatial data. Spatial point data is used represent the spatial nature of events. Examples of point data include the location of a customer’s iPhone purchases in business, the location of a crime in law enforcement, the location of attacks in the military, or the location of infrastructure in engineering. Point data means that an entity is represented by an x coordinate and a y coordinate in space. In data, this is most often done by recording a longitude and latitude. We are going to import a data set that you should already be familiar with: the Houston Crime data set. You can import this data with the command:

library(ggmap)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.0.3
library(Rcpp)
data(crime)  ##After running this commend, you shoud see 'crime' in your working environment.

Now let’s explore the data. Let’s first see what the column names are:

names(crime)
##  [1] "time"     "date"     "hour"     "premise"  "offense"  "beat"    
##  [7] "block"    "street"   "type"     "suffix"   "number"   "month"   
## [13] "day"      "location" "address"  "lon"      "lat"

We see that we have some time fields, type of offense fields, as well as some spatial fields. In addition to the street address, we actually have the longitude and the latitude. We can get a little more information on the columns with the command:

str(crime)
## 'data.frame':    86314 obs. of  17 variables:
##  $ time    : POSIXt, format: "2010-01-01 01:00:00" "2010-01-01 01:00:00" ...
##  $ date    : chr  "1/1/2010" "1/1/2010" "1/1/2010" "1/1/2010" ...
##  $ hour    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ premise : chr  "18A" "13R" "20R" "20R" ...
##  $ offense : Factor w/ 7 levels "aggravated assault",..: 4 6 1 1 1 3 3 3 3 3 ...
##  $ beat    : chr  "15E30" "13D10" "16E20" "2A30" ...
##  $ block   : chr  "9600-9699" "4700-4799" "5000-5099" "1000-1099" ...
##  $ street  : chr  "marlive" "telephone" "wickview" "ashland" ...
##  $ type    : chr  "ln" "rd" "ln" "st" ...
##  $ suffix  : chr  "-" "-" "-" "-" ...
##  $ number  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ month   : Ord.factor w/ 8 levels "january"<"february"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ day     : Ord.factor w/ 7 levels "monday"<"tuesday"<..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ location: chr  "apartment parking lot" "road / street / sidewalk" "residence / house" "residence / house" ...
##  $ address : chr  "9650 marlive ln" "4750 telephone rd" "5050 wickview ln" "1050 ashland st" ...
##  $ lon     : num  -95.4 -95.3 -95.5 -95.4 -95.4 ...
##  $ lat     : num  29.7 29.7 29.6 29.8 29.7 ...

Now that we’ve learned a little bit about the data, let’s explore the spatial component of our data. To do this, I usually start by plotting it against a plane white background as simple x and y coordinates. To do this in R, we simply type:

plot(crime$lon,crime$lat,xlab="lon",ylab="lat")

plot of chunk unnamed-chunk-4

This plot shows us immediately that we have some outliers that seem to be pretty far away from Houston, Texas.

Now let’s showcase the power of the ggmap package by plotting this same data on an image from Google Maps. First, let’s learn how to get a map from Google Maps. The following code gets and plots a map of Houston from Google Maps:

qmap("houston", zoom = 13)  #This command retrieves and plots a map of Houston

plot of chunk unnamed-chunk-5

Note that if we change the zoom to 6, we would get a map of most of Texas centered on Houston:

qmap("houston", zoom = 6) #Change zoom

plot of chunk unnamed-chunk-6

Now, in order to add points to this plot, we need need to store this image in an object and add to it. Plotting with the ggmap package is done with “layers.” You first plot the map and then you can add layers of other geospatial objects and/or text. We do this with the code below:

HoustonMap<-qmap("houston", zoom = 14)  #First, get a map of Houston

HoustonMap+                                        ##This plots the Houston Map
geom_point(aes(x = lon, y = lat), data = crime)    ##This adds the points to it
## Warning: Removed 80564 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-7

Note here that we assign the column name lon to the x variable, the column name lat as the y variable, and we tell R to get these field names from the crime data frame. If we wanted to color the points by the type offense, we could do this with the command:

HoustonMap+
geom_point(aes(x = lon, y = lat, colour = offense),
              data = crime)
## Warning: Removed 80564 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-8

Note here that offense is the name of the column that contains the categorical variable that we want to use to distinguish between the color of our points.

HoustonMap+
geom_point(aes(x = lon, y = lat, colour = offense),
              data = crime)+
ggtitle("Map of Crime in Houston")
## Warning: Removed 80564 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-9

Also note that we can easily save this image as a file by wrapping our script with two commands. These commands allow us to print our graphic to a file in our working directory instead of printing it to the screen. Here I’ve given a method to print to a .png graphics file:

png("myHoustonMap.png")  ##This is the name of the image file (in this case a .png file)
HoustonMap+
geom_point(aes(x = lon, y = lat, colour = offense),
              data = crime)+
ggtitle("Map of Crime in Houston")
dev.off()  ##This command indicates that we're done creating our plot.  It finalizes and closes the .png file.

Geocoding

There are many times when you are given data that has a location field, but it doesn’t have a column for latitude and longitude. You may have a field for address, city, or other text describing the location, but you don’t have the latitude and longitude. In these cases, the ggmap package provides a way for you to geocode these locations. The geocode function works by pinging Google Maps as if you were searching it on a web browser, and returning the lat and lon for a given location. For example, we can write:

geocode("West Point, NY")
##      lon   lat
## 1 -73.96 41.39

and note that this returns the latitude and longitude for West Point, NY. You can also enter a full address. Here we’ll enter the address for the White House:

geocode('1600 Pennsylvania Avenue Northwest, Washington, DC 20500')
##      lon  lat
## 1 -77.04 38.9

Try entering your home address. You can use this to get the lat and lon information so that you can plot your spatial data. For example, let’s say that you wanted to create a map of your squad. You could create a squad data frame that looks like this:

mySquad<-data.frame(
  Names=c("Bill","Dave","Liz","Cody","Austin","Amber","Jimmy","Mike","Amanda"),
  Location=c("Boston, MA","Chicago, IL","Atlanta, GA","Portland, OR","Albuquerque, NM",
              "Dallas, TX","St Louis, MO","Miami, FL","San Francisco, CA"),stringsAsFactors=FALSE)

Here’s what our data looks like.

mySquad
##    Names          Location
## 1   Bill        Boston, MA
## 2   Dave       Chicago, IL
## 3    Liz       Atlanta, GA
## 4   Cody      Portland, OR
## 5 Austin   Albuquerque, NM
## 6  Amber        Dallas, TX
## 7  Jimmy      St Louis, MO
## 8   Mike         Miami, FL
## 9 Amanda San Francisco, CA

Let’s add a lat and lon column to this data. We can do this by geocoding in a for loop like this:

for(i in 1:9){
  temp<-geocode(mySquad$Location[i]) #Geocode each location and store it in temp
  mySquad$lon[i]<-temp$lon           #Assign the lon
  mySquad$lat[i]<-temp$lat           #Assign the lat
}

Note that our data now has a spatial aspect that we can use for analysis:

mySquad
##    Names          Location     lon   lat
## 1   Bill        Boston, MA  -71.06 42.36
## 2   Dave       Chicago, IL  -87.63 41.88
## 3    Liz       Atlanta, GA  -84.39 33.75
## 4   Cody      Portland, OR -122.68 45.52
## 5 Austin   Albuquerque, NM -106.61 35.11
## 6  Amber        Dallas, TX  -96.80 32.78
## 7  Jimmy      St Louis, MO  -90.20 38.63
## 8   Mike         Miami, FL  -80.19 25.76
## 9 Amanda San Francisco, CA -122.42 37.77

We can now plot our squad

US<-qmap("United States",zoom=4,color="bw")

US+
geom_point(aes(x = lon, y = lat),color="red",data = mySquad)+
  annotate('text', x=mySquad$lon, y=mySquad$lat+1, label = mySquad$Names, colour = I('red'), size = 4)+
  ggtitle("The Coolest Squad")

plot of chunk unnamed-chunk-17

Your Turn

Now it’s your turn. See if you can create a spatial points map of a subset of your IED data. To do this, you will first have to create a .csv file of your data, and read it into R. If you forget how to do this, refer to the review lessons mentioned above.