HPMS Basemap Part 1: Data Analysis to Determine Shape / Distribution
Intro
I'm currently trying to put a HPMS basemap together and have several pieces of code that scrape ESRI feature servers as I started map rendering I realized the data was sometimes unreliable across different states and decided I needed to look at the HPMS submissions more holistically mainly at the underlying attribute data.
Source Data
The source data is located here I utilize the ARNOLD_{STATE}2019 layers and the HPMS_FULL{STATE}_2019 layers for GIS data and attribute data. In this exercise we are using the HPMS full layers.
Downloading into CSV > GIS Form
The first thing I did is write something that exports the FULL state layers into a csv file for easier tabular manipulation. After that I scraped the data into 50+ individual files totaling about 2.87 GB. Some states for whatever reason don't have data on the a particular year layer configuration not sure why, but I tried to choose the year with most coverage.
Goal
The goal is to try to get an idea of how reliably I can rely on "route_signing" data to be in the HPMS FULL layers specifically to style and render out map tiles based on these fields. This is time consuming process so having a good idea of what the data looks like before trial and error rendering is crucial. This route_signing data will then map to a field on the basemap data that dictates styling or whether it should exist or at that current tile.
Practicality With Hardware
This code reads in all csv files and combines them into one big table. Generally this isn't practical for large datasets but locally this fit into memory after being read in. I have a 32GB macbook pro but if I was on a machine with less ram this probably wouldn't be feasible or super slow going to swap. This script ran in under a minute which isn't bad considering it is 30 gb in memory.
Data Munging / Manipulation / Visualization
I generally prefer this approach over sql statements because I am super familiar with pandas and I can manipulate the data through a repl like ipython as I go. Pandas allows you to manipulate tabular data in an expressive way to shape it into the format you like/need.
I then used seaborn to render out the heatmap above for the result.
The Code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import os
# setting the state code label conversion
sc_label = {1: 'AL', 2: 'AK', 4: 'AZ', 5: 'AR', 6: 'CA', 8: 'CO', 9: 'CT', 10: 'DE', 12: 'FL', 13: 'GA', 15: 'HI', 16: 'ID', 17: 'IL', 18: 'IN', 19: 'IA', 20: 'KS', 21: 'KY', 22: 'LA', 23: 'ME', 24: 'MD', 25: 'MA', 26: 'MI', 27: 'MN', 28: 'MS', 29: 'MO', 30: 'MT', 31: 'NE', 32: 'NV', 33: 'NH', 34: 'NJ', 35: 'NM', 36: 'NY', 37: 'NC', 38: 'ND', 39: 'OH', 40: 'OK', 41: 'OR', 42: 'PA', 44: 'RI', 45: 'SC', 46: 'SD', 47: 'TN', 48: 'TX', 49: 'UT', 50: 'VT', 51: 'VA', 53: 'WA', 54: 'WV', 55: 'WI', 56: 'WY', 60: 'AS', 66: 'GU', 69: 'MP', 72: 'PR', 78: 'VI'}
# setting the route_signing conversion
s_label = {1: 'Not Signed ', 6: 'County', 2: 'Interstate ', 7: 'Township', 3: 'U.S.', 8: 'Municipal', 4: 'State ', 9: 'Parkway Marker or Forest Route Marker', 5: 'Off-Interstate Business Marker ', 10: ' None of the Above'}
# getting all the csvs file in a particular directory
csvs = [os.path.join("csvs", file) for file in os.listdir("csvs") if os.path.getsize(os.path.join("csvs", file)) > 0]
# reading in each states csv and then combining into one huge data frame
data = pd.concat([pd.read_csv(i, encoding="latin1") for i in csvs], ignore_index=True)
# getting length
data[['begin_point','end_point']].fillna(value=0.0,inplace=True)
data['length'] = data.end_point-data.begin_point
# creating state code count dictionary
s_d = data.groupby("state_code")['length'].sum().to_dict()
# creating state route signed grouped and count
ee = data.groupby(["state_code", "route_signing"])[["length"]].sum()
# defining how I want the sign systems to appear on the graph
s_systems = ['Interstate ', 'U.S.', 'State ', 'County', 'Township', 'Municipal', 'Off-Interstate Business Marker ', 'Parkway Marker or Forest Route Marker', ' None of the Above', 'Not Signed ', '']
# apply a function over f system to give me a percent of each route signing
ee = ee.apply(lambda x: float(x.length) / float(s_d.get(x.name[0], 0.0)) * 100.0, axis=1)
# resetting index and labelling column distribution
ee = ee.reset_index()
ee.rename(columns={0: "Distribution"}, inplace=True)
# mapping state code and route signing appropriately
ee.state_code = ee.state_code.astype(int).map(lambda x:sc_label.get(x,''))
ee.route_signing = ee.route_signing.astype(int).map(lambda x:s_label.get(x,''))
usigns = ee.route_signing.unique()
# creating the pivot table
pv_tbl = ee.pivot("route_signing", "state_code", "Distribution")
# creating the heat map
ax = sns.heatmap(
pv_tbl.loc[[i for i in s_systems if i in usigns], :],
cmap="YlGnBu",
xticklabels=ee.state_code.unique(),
# des='HPMS 2019 - Percent of Mileage for Each route_signing Field',
annot=True, fmt=".1f",linewidths=0.5)
plt.title("HPMS 2019 - Percent of Mileage for Each route_signing Field")
plt.show()
Results
This exercise showed that route_signing can be generally relied on for styling of data in HPMS and more generally how we can manipulate large HPMS data sets to get the data we are looking for into the correct shape using pandas and python.