Introduction

R is an extremely powerful statistical scripting language. It is open-source and quickly gaining traction across academia, research organizations, and businesses. It is often the tool of choice for statisticians, data scientists, quantitative financial analysts, and a myriad of other professions. It is used for research at the vast majority of graduate schools. It is currently used by companies like Facebook, Google, the NY Times and Wallstreet financial organizations.

R is open-source and is freely available to download. You can put base R on any government computer. You can use base R as-is to write and run R script. That being said, RStudio has provided a very useful “front-end” for R that is generally easier to use (R is still the “engine”; you can’t run RStudio without R). We will primarily use RStudio in SE370. Some DoD organizations, however, will not allow you to install RStudio. Remember though, you can still run everything just like we teach you from base R.

In SE370 we have dedicated four lessons to learning how to conduct geospatial analysis in R. Spatial analysis and visualization is increasingly being used in industry, the military, other government organizations, and academia. R is powerful tool in this world of geospatial analysis and visualizaion and has hundreds of packages dedicated to these types of tasks (as seen here). In SE370 we will teach you how to accomplish the following geospatial tasks in R:

  1. Geocode locations
  2. Read in and understand spatial points data
  3. Read in and understand shape files
  4. Plot points on map (normal and with point density)
  5. Plot contour heat map
  6. Plot pie graph over polygons

If you need to conduct a quick review of R, I recommend that you look at two introductory lessons that we’ve developed for SE350:

Before you begin conducting spatial analysis in R, make sure that you install two necessary packages: ggmap, Rcpp, and sp. This can be done in the packages tab in the lower right quadrant of RStudio. It can also be done on the command line by typing install.packages('ggmap') , install.packages('Rcpp'), and install.packages('sp')

Point Data

Point data is the simplest type of geospatial data. Spatial point data is used represent the spatial nature of events. Examples of point data include the location of a customer’s iPhone purchases in business, the location of a crime in law enforcement, the location of attacks in the military, or the location of infrastructure in engineering. Point data means that an entity is represented by an x coordinate and a y coordinate in space. In data, this is most often done by recording a longitude and latitude. We are going to import a data set that you should already be familiar with: the Houston Crime data set. You can import this data with the command:

library(ggmap)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.0.3
library(Rcpp)
data(crime)  ##After running this commend, you shoud see 'crime' in your working environment.

Now let’s explore the data. Let’s first see what the column names are:

names(crime)
##  [1] "time"     "date"     "hour"     "premise"  "offense"  "beat"    
##  [7] "block"    "street"   "type"     "suffix"   "number"   "month"   
## [13] "day"      "location" "address"  "lon"      "lat"

We see that we have some time fields, type of offense fields, as well as some spatial fields. In addition to the street address, we actually have the longitude and the latitude. We can get a little more information on the columns with the command:

str(crime)
## 'data.frame':    86314 obs. of  17 variables:
##  $ time    : POSIXt, format: "2010-01-01 01:00:00" "2010-01-01 01:00:00" ...
##  $ date    : chr  "1/1/2010" "1/1/2010" "1/1/2010" "1/1/2010" ...
##  $ hour    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ premise : chr  "18A" "13R" "20R" "20R" ...
##  $ offense : Factor w/ 7 levels "aggravated assault",..: 4 6 1 1 1 3 3 3 3 3 ...
##  $ beat    : chr  "15E30" "13D10" "16E20" "2A30" ...
##  $ block   : chr  "9600-9699" "4700-4799" "5000-5099" "1000-1099" ...
##  $ street  : chr  "marlive" "telephone" "wickview" "ashland" ...
##  $ type    : chr  "ln" "rd" "ln" "st" ...
##  $ suffix  : chr  "-" "-" "-" "-" ...
##  $ number  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ month   : Ord.factor w/ 8 levels "january"<"february"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ day     : Ord.factor w/ 7 levels "monday"<"tuesday"<..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ location: chr  "apartment parking lot" "road / street / sidewalk" "residence / house" "residence / house" ...
##  $ address : chr  "9650 marlive ln" "4750 telephone rd" "5050 wickview ln" "1050 ashland st" ...
##  $ lon     : num  -95.4 -95.3 -95.5 -95.4 -95.4 ...
##  $ lat     : num  29.7 29.7 29.6 29.8 29.7 ...

Now that we’ve learned a little bit about the data, let’s explore the spatial component of our data. To do this, I usually start by plotting it against a plane white background as simple x and y coordinates. To do this in R, we simply type:

plot(crime$lon,crime$lat,xlab="lon",ylab="lat")

plot of chunk unnamed-chunk-4

This plot shows us immediately that we have some outliers that seem to be pretty far away from Houston, Texas.

Now let’s showcase the power of the ggmap package by plotting this same data on an image from Google Maps. First, let’s learn how to get a map from Google Maps. The following code gets and plots a map of Houston from Google Maps:

qmap("houston", zoom = 13)  #This command retrieves and plots a map of Houston