Book - 7.4) Project - An Example

An Example Project¶

In this chapter a small project is developed. The project brings together many of the techniques and ideas that have been learned up to this point. The project also illustrates the structure of the project for this class.

Data Set¶

This project uses the Earthquake data stream. This data stream, made publicly available by the U.S. Geological Survey, contains a record of all earthquakes that are reported throughout the world. An abstraction of an earthquake is captured by a set of properties. The data stream has more properties than are used in this project.

The Earthquake data stream is accessible from Python through the earthquakes.py module. This module contains a function get_report that returns information about recent earthquakes based on two parameters: the time period of the reported earthquakes and the threshold of severity level. The time period may be one of “hour”, “day”, “week”, or “month”. The threshold specifies the range of magnitudes included in the report and may be one of “significant”, “all” “4.5”, “2.5”, or “1.0”. The Python code used to obtain the data stream for this project is as follows:

import earthquakes
quakes = earthquakes.get_report('month','all')

This stream consists of all reported earthquakes in the last month of any severity.

A map of a complex data structure can be made using the Variable explorer window in Spyder. The map is a pictorial representation of the data that can outline the structure of the data stream and give guidance on how to access the parts of the data of interest. A portion of the Spyder window is shown in the figure below. Properties of various types are defined in the editor window. After the Run button is pushed the display in the Variable explorer window appears as shown. Each of the properties used in the code has an entry in the Variable explorer.

The Variable Explorer in Spyder

Each property shown in the Variable explorer window has four columns. The Name column gives the name of the property. The Type column shows what kind of value the property has. The names used in the Type column are summarized in the following table. Notice that the property name is a str (character string), whole_number is an int, and number is a float.

Type field	Meaning
dict	a dictionary accessed by keywords
list	a list structure accessed by position
str	a character string
float	a number with a decimal point
int	a whole number (without a decimal point)

The Size column in the Variable explore is 1 for simple types (numbers and strings) because these are single values. The size is not the number of characters in the character string or the number of digits in the number. Each of these types of values are considered a single unit. Notice that name, number, and whole_number are all of size 1. The size of a list is the number of elements in the list. Notice that number_list has a size of 5 because the list is defined with 5 values. The size of a dictionary is the number of key-value pairs in the dictionary. Notice that weather has a size of 3 because it has 3 key-value pairs.

A dictionary and a list displayed by the Variable explorer can be expanded to show its contents in detail by double-clicking on the entry in the Variable explorer. For example, the figure below show the result of double-clicking on the number list. A separate window is displayed that shows the details of each element of the list. In this example, all of the elements of the list are int*s and of size 1. Notice that the values of the list elements are exactly the same as those defined in the Python code. When you are done examining the list the window displaying the list can be close by clicking on the *OK button.

Expanding a List using the Variable Explorer in Spyder

Similarly, a dictionary can be expanded in the same way to show its contents in more detail. The following figure shows the result of double-clicking on the weather dictionary shown in the Variable explorer. A new window appears to display the contents of the dictionary. Each entry in the window shows a key-value pair. For example, the first row has the key ‘humidity’ and a value of type int. This value is the number 20. Notice that the three key-value pairs shown in the window are the same as the Python code defines. This window can be closed by clicking the OK button.

Expanding a Dictionary using the Variable Explorer in Spyder

The example so far have illustrated how Spyder’s Variable explorer can be used to display simple types (numbers and strings) and simple data structures (individual lists and dictionaries). However, the data of most interest in practice has a more complicated organization reflecting the multiple layers of abstraction of a complex real-world phenomenon that we are modeling. We next see how to use Spyder to explore a more complicated data structure and develop a “map” - a visual guide to the data’s organization.

As an illustration of how to interactively map a complex data structure we will use the data stream for earthquake data obtained from the US Geological Survey. In this example we being by getting the data stream for all of the earthquakes in the past week. This data stream is returned by the get_report function as shown in the following figure. Notice that the data stream is assigned as the property quakes. This property is displayed in the Variable explorer area.

Exploring the Earthquake Data (Step 1)

We can see in the Variable explorer that the quakes property is a dictionary that has 3 key-value pairs. To discover the details of this dictionary we can double-click on the property’s entry in the Variable explorer as was done above. As a result, a new window with “quakes” in the title appears as shown in the following figure. This window gives a visual depiction of the quakes dictionary. We can see in this case that the dictionary has three key-value pairs. The keys are: ‘area’, ‘earthquakes’, and ‘title’. The value associated with the “area” key is a dictionary. The value associated with the ‘earthquakes’ key is a list with 2412 elements. The value associated with the key ‘title’ is a simple string whose value is shown.

Exploring the Earthquake Data (Step 2)

We can explore the quakes data further by examining the list value associated with the key ‘earthquakes’. This can be done by double-clicking on the ‘earthquakes’ entry in the “quakes” window displayed by Spyder. In response, another new window with “earthquakes” in the title will be created that shows the list. This is shown in the following figure.

Exploring the Earthquake Data (Step 3)

In the new window (the one with “earthquakes” in the title) we can see that each element of the list associated with the key ‘earthquakes’ is itself a dictionary. The size field indicates that each of these dictionaries has 15 key-value pairs. To explore the structure of these dictionaries, double-click on any item in the list shown in the “earthquakes” window. As a result, another window will be displayed that shows the 15 key-value pairs for the selected item in the list. An example of this is shown in the following figure.

Exploring the Earthquake Data (Step 4)

The newly displayed window shows each of the 15 key-value pairs. One key, ‘magnitude’, is a simple number (type float) that is the measured magnitude of the earthquake on the Richter scale. The value associated with the the key ‘location’ indicates where the earthquake occured around the globe.

The principal component of the Earthquake data stream is a list of earthquakes. The abstraction of an earthquake used in this project has two properties:

location: describing where on the globe the earthquake was centered
magnitude: the strength of the earthquake measured in the Richter scale

The figure also shows that the location property consists of three elements:

latitude: the latitude on the globe of the earthquake
longitude: the longitude on the globe of the earthquake
depth: the depth in the earth’s crust at which the earthquake was centered

In summary, the structure of the parts of the data stream used in this project is shown in the figure below.

Structure of the Earthquakes Data Stream

Questions¶

The analysis of the earthquake data stream will answer the following questions:

What is the distribution of earthquake magnitudes?
Are the latitudes and longitudes correlated?
Is there a relationship between magnitude and depth?
Where on the globe do earthquakes occur?

The first question gives insight into whether earthquakes of all magnitudes are equally likely or if there is a greater likelihood of some range of magnitudes than others. The answer to the second question indicates whether earthquakes are likely to occur at any given place on the globe or whether there are earthquake-prone areas. The answer to the third question yields information on the nature of earthquake formations. Do more intense earthquakes occur deeper in the earth’s crust? Are deeper earthquakes associated with particular geographic regions? The answer to the last question allows us to see the relationship between the location of earthquakes and the position of continents. If the second question indicates that earthquakes are more likely in some areas the answer to the third question will help identify the geographic areas which are more prone to earthquakes.

Information about the nature or location of earthquakes is valuable because earthquakes have the potential for great damage and loss of life. Earthquakes are also highly unpredictable lending importance to any insight into their pattern of occurrence.

Earthquake analysis is useful to scientists, government planners, businesses, and the general public. Earth scientists are interested in the general phenomenon of earthquakes in an attempt to better explain, and ultimately predict, their occurrence. Government planners need to consider the potential effects of earthquakes in developing critical infrastructure for power generation, water distribution, transportation, etc. Government planners are also concerned with the development of building codes that need to account for potential earthquake stress. Business may avail themselves of earthquake risks in deciding on siting of facilities and insurance coverage.

Limitations¶

The conclusions from this study are primarily limited by the duration of the time period represented by the data stream being analyzed. The data stream contains only information on earthquakes for the most recent month. This limited period of time may not be representative of the pattern or nature of earthquakes over a longer period of time. For example, if the most recent month was a relatively quiet period of time in an otherwise earthquake prone area then the results of this study would not properly reflect the geographic distribution of earthquakes. A secondary limitation is that only the severity and location characteristics of earthquakes are included in the abstraction of an earthquake event. There may be other characteristics that have not been included which give significant insight into earthquake occurrences.

Program Development¶

The program development begins by importing the earthquake service and generating the description of all earthquakes for the past month. This is shown in the following code:

import earthquakes

# Get all earthquake reports
quakes = earthquakes.get_report('month', 'all')

Code for Question 1.

To answer the first question regarding the distribution of magnitudes a data stream is produced that contains only the magnitude information. This data is then plotted in a histogram form. The code for this is:

import matplotlib.pyplot as plt

# Generate a list of magnitudes
magnitudes = []
for qk in quakes['earthquakes']:
   magnitudes.append(qk['magnitude'])

# Plot histogram of magnitudes
plt.hist(magnitudes, bins=[0,1,2,3,4,5,6,7])

# Label Axis
plt.xlabel('Magnitudes')
plt.ylabel('Occurrences')
plt.title('Histogram of Magnitudes')

# Display histogram
plt.show()

# Clear before the next set of graphs
plt.clf()

After forming a list of earthquake magnitudes the average magnitude is computing by iterating through the list. The histogram produced by this code is shown in the next section. Notice that the matplotlib module is imported and that the hist function is used to generate the histogram. The bins argument specifies that the magnitude data be presented in “bins”. The first bin contains all earthquakes of magnitude of at least zero and less than 1, the second bin contains all earthquakes of magnitude of at least 1 and less than 2, and so on. The labels on the histogram are appropriately labelled before the histogram is displayed.

We can also get some additional information about the distribution of magnitudes by determining the mean (the average earthquake) magnitude and the median magnitude. The median is a measure of the middle-point of a list of values in that half of the values are below the median and half of the values are greater than the median. The code to compute these two statistics is shown here:

#Determine the mean magnitude value

magnitudeMean = round(sum(magnitudes)/len(magnitudes),2)

#Determine the median magnitude value

sortedMagnitudes = sorted(magnitudes)
magnitudeMedian = sortedMagnitudes[int(len(magnitudes)/2)]

print('mean   = ', magnitudeMean)
print('median = ', magnitudeMedian)

This code makes use of a number of Python’s built-in function for dealing with lists. These are:

sum: return the value of adding all of the list values together
len: return the number of values in the list
sorted: return a new list that has all of the list elements in sorted order

The code also uses the int() function that returns just the integer part of a given number. That is int(2.5 is 2 and int(2.9) is also 2.

The code above computes the average value as the sum divided by the length of the list of magnitudes. The median value is the middle element in the sorted list of values. For the data analyzed here the output it:

mean = 1.47 median = 1.2

The histogram plot produced by this code is shown in the next section.

Code for Question 2.

The answer to the second question explores whether there is a correlation between the longitude and latitude of earthquakes. In other words, are there “favorite places” for earthquakes to occur or do they occur randomly across the globe. This question can be answered by constructing a scatter plot of latitudes versus longitudes. A relatively random pattern in the plot would suggest that the earthquakes occur randomly whereas clusters of earthquakes would suggest that those places are more prone to earthquakes. The code to produce this scatter plot is as follows:

latitudes = []
longitudes = []

#Generate list of latitudes and corresponding longitudes
for qk in quakes['earthquakes']:
    latitudes.append(qk['location']['latitude'])
    longitudes.append(qk['location']['longitude'])

# Generate scatter plot of locations of earthquakes on a 2D grid
# At each earthquake location (longitude[i], latitude[i]) put a red '+' sign

plt.scatter(longitudes, latitudes, c='r', marker='+')

# Label Axes
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Earthquake Longitude vs. Latitude')
plt.show()

# Clear before the next set of graphs
plt.clf()

Notice that the quakes data stream obtained for the first question is reused here. Because the mapplotlib function scatter (that produces the scatter plot) requires the data in separate lists, the code begins by filtering the quakes data into the required two lists.

The scatter plot produced by this code is shown in the next section.

Code for Question 3.

An appropriate way to answer the third question is to generate a scatter plot of the magnitudes versus the depths of reported earthquakes. The magnitude information was produced in the answer to the first question. The new code needed for this question is shown here:

#Generate list of depths for each earthquake
for qk in quakes['earthquakes']:
    depths.append(qk['location']['depth'])


# Generate scatter plot of magnitudes vs. depths of earthquakes on a 2D grid
# At each earthquake location (longitude[i], latitude[i]) put a red '+' sign

plt.scatter(magnitudes, depths, c='r', marker='+')

# Label Axes
plt.xlabel('Magnitude')
plt.ylabel('Depth')
plt.title('Earthquake Magnitude vs. Depth')
plt.show()

# Clear before the next set of graphs
plt.clf()

This code uses the same maplotlib function as before to generate a scatter plot between the magnitudes and the depths of the earthquakes. The scatter plot visualization produced by this code is shown in the next section.

There is a great deal of redundancy between the code that generates the scatter plot of longitudes versus latitudes and the code that generates the scatter plot of magnitudes versus depth. Therefore, a new function is defined to eliminate this redundancy. The significant parts of the refactored code are shown as follows:

def drawPlot(xValues, yValues, xLabel='X Axis', yLabel='Y Axis',
             figureLabel='Scatter Plot'):
    # Generate scatter plot of xValues versus yValues on a 2D grid
    # At each data point put a red '+' sign

    plt.scatter(longitudes, latitudes, c='r', marker='+')


    # Label Axes
    plt.xlabel(xLabel)
    plt.ylabel(yLabel)
    plt.title(figureLabel)

    # Show the figure and then clear when the window is closed
    plt.show()
    plt.clf()

# Get earthquake data
quakes = earthquakes.get_report('month', 'all')

# Generate lists of magnitudes and corresponding depths
longitudes = []
latitudes = []
for qk in quakes['earthquakes']:
    latitudes.append(qk['location']['latitude'])
    longitudes.append(qk['location']['longitude'])

# Draw scatter plot of latitudes vs. longitudes of earthquakes
drawPlot(latitudes, longitudes, 'Latitudes', 'Longitudes',
         'Earthquake Occurrences')

# Generate lists of magnitudes and corresponding depths
magnitudes = []
depths = []
for qk in quakes['earthquakes']:
    magnitudes.append(qk['magnitude'])
    depths.append(qk['location']['depth'])


# Draw scatter plot of magnitudes vs. depths of earthquakes

drawPlot(magnitudes, depths, 'Magnitude','Depth',
         'Earthquake Magnitude vs. Depth')

In this refactoring, the drawPlot function is defined to contain the common code to draw a scatter plot. The parameters of the drawPlot function are:

xValues: a list of the x-axes values of each data point
yValues: a list of the y-axes values of each data point
xLabel: the text used as the label for the x-axis
yLabel: the text used as the label for the y-axis
figureLabel: the text used to label the entire figure

Notice that the three label parameters have default values so that simple plots can be created without worrying about the details of labelling the axes and the figure.

By using the drawPlot function the code for the two scatter plots is then simplified to generating the lists of data to be plotted and then calling the drawPlot function with these lists and the desired (if any) labelling information.

Code for Question 4.

The answer to the last question involves plotting the location data on a two dimensional globe. The data required for this step is the same longitude and latitude data as was generated immediately above. The new code needed to produce the required visualization is shown next. This code uses a module named basemap that is not a standard part of the matplotlib module and must be installed separately. This part of the project is included to show the potential to map data onto geographic regions.

from mpl_toolkits.basemap import Basemap

# Create world map with a Hammer (eliptical, equal-area) projection
# centered at 180 degrees longitude (i.e. longitude zero of the
# visualization is longitude 180 on the globe)

map = Basemap(projection='hammer',lon_0=180)

# Fill in the outlines of continents
map.drawcoastlines()

# Convert latitudes and longitudes to coordinates on the world map
x, y = map(longitudes,latitudes)

# Map the locations onto the world map
map.scatter(x, y)

plt.title('Locations of earthquakes')
plt.show()

plt.clf()

In this code the Basemap utility is imported from mpl_toolkits. Basemap is used to create a two dimensional map of the globe using a Hammer projection. The outlines of the continents are drawn on this globe using the drawcoastlines function. Next, the latitude and longitude are converted to map coordinates and plotted on the map. The title is added to the graph and then presented to the user. The global map plot produced by this code is shown in the next section.

Because there are a wide variety of ways to project the earth’s surface onto a two dimensional plane, a second visualization using a Lambert Conformal projection is also created. The additional code for to generate this projection is shown here:

# Create a world map with a Lambert Conformal projection
# centered at point (90, -107) and is 6000km by 4500 km
lambertMap = Basemap(projection='lcc',lat_0= 90, lon_0=-107, height=60000000,
           width= 45000000)

# Fill in world map outlines
lambertMap.drawcoastlines()
lambertMap.drawmapboundary()

# Convert latitudes and longitudes to coordinates on the world map
x, y = lambertMap(longitudes,latitudes)

# Map the locations onto the world map
lambertMap.scatter(x, y)

plt.title('Locations of earthquakes')

    # Draw the map
plt.show()

plt.clf()

At this point it was realized that the code for the two projections are extremely repetitive. This leads to a refactoring of the code to define and use a function that incorporates the common code for the two mapping operations. The function, drawMap has a single parameter which is the map structure returned by the Basemap function. The code that defines and uses the function to produce the two maps is shown next.

def drawMap(map):
# Plot points on a given world map
    # Fill in world map outlines
    map.drawcoastlines()
    map.drawmapboundary()

    # Convert latitudes and longitudes to coordinates on the world map
    x, y = map(xValues, yValues)

    # Map the locations onto the world map
    map.scatter(x, y)

    plt.title('Locations of Earthquakes')
    plt.show()

    plt.clf()

# Use the drawMap function to create a world map with a Hammer
    # (eliptical, equal-area) projection centered at 180 degrees longitude
hammer_map = Basemap(projection='hammer',lon_0=180)
# Draw the map
drawMap(hammer_map)


# Use the drawMap function to create a world map with a Lambert
# Conformal projection centered at point (50, -107) and is 4500km by 6000 km
lambert_map = Basemap(projection='lcc',lat_0= 90, lon_0=-107, height=60000000,
           width= 45000000)
# Draw the map
drawMap(lambert_map)

Notice that this code greatly simplifies the code by removing redundancies and makes it easier to create additional maps that are returned by Basemap.

The complete code for the sample project can be downloaded from here. Keep in mind that this code uses the basemap module which must be installed on your system separate from the matplotlib module.

Project-Complete.py

Visualizations¶

The first visualization is the histogram of magnitudes. The histogram groups together all magnitudes of the same first significant digit. In other words, all magnitudes less than 1.0 are grouped into the same category (category 0), magnitudes at least 1.0 and less than 2.0 are grouped into the same category (category 1), and so on. Recall that the average magnitude value was 1.47.

Histogram of Earthquake Magnitudes

The second visualization shows the scatter plot of longitudes versus latitudes. There is clearly some pattern in the locations of earthquakes because the data points are not uniformly distributed. Rather there are groupings of earthquakes and “tracks” along which earthquakes appear to occur.

Distribution of Earthquake Locations

The next visualization is a scatter plot of the magnitudes versus the depth of the earthquakes. The magnitudes are shown on the horizontal axis and the depths are shown on the vertical axis. Each earthquake is represented by a single data point plotted as a read cross symbol.

Magnitude vs. Depth of Earthquakes

The final two visualizations plot the earthquake locations on two dimensional projects of the earth’s surface. Two different projects are used, the Hammer projection in the first visualization and the Lambert Conformal projection in the second visualization. The first visualization is position over the Pacific ocean to best show the pattern of earthquake occurrences. The second visualization is centered over the North Pole.

Location of Earthquake on the Globe using Hammer Projection

Location of Earthquake on the Globe using Lambert Conformal Projection

The conclusions that can be drawn from these visualization are presented in the next section.

Conclusions¶

The first question explores the distribution of earthquake magnitudes. The results for the analyzed data was shown in the figure above titled “Histogram of Earthquake Magnitudes”. The distribution is very skewed with the large majority of the earthquakes being of the lowest magnitudes. Only a very small percent of the earthquakes are of the more severe magnitudes. This is also illustrated by the measures of mean and median. The mean value for this set of data is a magnitude of 1.47 while the median value is only 1.2. Thus, the average earthquake magnitude is below 1.5 while half of all earthquakes have a magnitude of less than 1.2. The mean is higher than the median due to the effect of a few earthquakes of much bigger magnitudes.

The second question concerns whether earthquakes occur with equal likelihood around the world or whether there are preferred locations for earthquake occurrences. This question was explored by plotting the longitude versus the latitude of each earthquakes as shown in the figure above titled “Distribution of Earthquake Locations.” As shown in the figure earthquakes are not uniformly distributed across the globe. There are evident clusters of earthquakes in several locations and sets of earthquakes that form paths.

The third question analyzes the relationship between the depth of an earthquake and its magnitude. The question is whether more severe earthquakes occur deeper in the earth or nearer to the earth’s surface. The scatter plot of magnitudes versus depths is shown in the figure above titled “Magnitude vs. Depth of Earthquakes”. From this visualization it can be seen that earthquakes at greater depths are all of high magnitude. It can also be seen that higher magnitude earthquakes can occur all any depth.

The relationship between the clusters of earthquake locations and continental features is shown in the two figures that plot the earthquake locations on two dimensional projections of the earth’s surface. These figures are titled “Location of Earthquake on the Globe using Hammer Projection” and “Location of Earthquake on the Globe using Lambert Conformal Projection”. In both cases it is evident that a large number of earthquakes occur along the Pacific rim. This analysis is, of course, consistent with the edges of adjacent tectonic plates.

The conclusions can be summarized as follows:

Most earthquakes are of low magnitude with few at the higher magnitudes
Some parts of the globe are earthquake prone, especially along the pacific rim.
Earthquakes that occur deeper in the earth’s crust are always higher magnitude earthquakes.

The limitations noted above (see “Limitations”) must, of course, be kept in mind.

Social Impacts¶

The study of earthquakes, especially with regard to their locations and magnitudes, has impacts on a number of different stakeholders. Among these stakeholders and their concerns are:

Property owners: Individuals owning personal property (e.g., their primary residence) or companies owning buildings buildings have a strong interest in protecting themselves against loss due to an earthquake. This protection may take the form of insurance policies to pay for reconstruction after an earthquake or improvement to structures to make them earthquake resistant.
Insurance companies: business offering insurance policies to pay for reconstruction must assess the likelihood and extent of possible damage caused by earthquakes in areas where they offer coverage. The rates that they set for their insurance must be in line with the actuarial risks.
Governments: all levels of governments have some role with regard to earthquakes. The federal and state governments are responsible for emergency preparedness and recovery should a severe earthquake jeopardize the safety of citizens in their jurisdictions. State and local governments have key role in the location and securing of critical infrastructure, such as hospitals or power plants, that might be affected by earthquakes. These levels of government are also the one that have define building codes and must determine the standards for building safety in the event of earthquakes.
Geoscientists: Individuals studying earth science phenomenon are interested in the underlying causes of earthquakes. There is a current controversy about the possible relationship between hydraulic fracturing (“fracking”) as a means of oil and gas based energy production and earthquakes. Does fracking cause additional earthquake activity? A study of earthquake occurrences and fracking sites along with long term studies of earthquake occurrences is important in answering this question.

There are tensions among these stakeholders that are adjusted with new knowledge and the experience of earthquakes. For example:

To what extent should government regulate various types of building or activity in areas with different earthquake potential? Should a nuclear reactor, a chemical plant, or a gas pipeline be locatable in an earthquake prone area? What level of earthquake susceptible is sufficient to make this decision given that more distant locations may raise the cost to consumers of the services provided by these plants.
What is balance of responsibility between insurers and government for recovery and reconstruction? How does this balance change in the case of rare but possible events (the “big one” in California).
What is the public’s stance regarding possible unanticipated consequences of fracking if evidence mounts that it is related to some additional earthquake risk? At what price is “cheap” energy affordable if areas previously unaccustomed to earthquakes become more earthquake prone?

One an individual level, how can geologists resolve the moral dilemma of helping oil and gas companies to increase the level of fracking when its geological consequences are not clear?

The above description has the focus on a national perspective - reflecting on the different levels of government, traditional zoning of land use and building codes, and the operation of insurance industries. These issues also have an international perspective. As earthquakes are global so are the potential impacts of “big data” studies of earthquake behavior. How do national governments and international organizations relate to the issues of preparedness, reconstruction, and recovery especially in countries with the potential for high impact from earthquakes but with poor or fragile economies?