Analyzing Economic Impacts of COVID-19 Through Travel Data¶

By Sharvari Bhatt, Lydia Kim, Yi Fan Lim, Saeed Razavi, Leon Smith¶

#Imports
import pandas as pd
import os
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline 
from sklearn.cross_decomposition import CCA

df = pd.read_excel('.\\Econ and CS Project Data.xlsx', sheet_name=0).T
#relative addressing. Keep the ipynb in the same file as the xlsx

header = df.iloc[0]
header.name = 'Feature'
df = df[1:]
df.columns = header 

df

econ_keys = ['Real PCE for Goods', 'Real PCE for Services', 'Real GDP','Personal income', 'Disposable personal income','Annual Mean Unemployment Rate']
travel_keys = ['Air carrier domestic all services vehicle-miles','Highway vehicle-miles', 'Transite vehicle-miles', 'Rail train-miles','Air-travel arrivals in the USA', 'Air-travel departures from the USA']

Both the truncated correlation matrix and the pair plot below show comparisons of our datasets with economic data on the vertical axis and travel data on the horizontal. The correlation matrix shows the correlation between any two features at the position where their row and column intersect. An intution for that correlation can be gleamed by looking at the same on the pair plot. This knowledge of individual correlations can help us later when explaining the results of our canonical correlation analysis.

df.drop(['Units']).astype(float).corr().head(6)[travel_keys]
#Correlation matrix shows correlation of each feature row by column

sns.set(style="darkgrid")
sns.set_palette("Dark2")

sns.pairplot(df.drop(['Units']),height=3, x_vars = travel_keys, y_vars = econ_keys)
plt.show()

#Define input and output datasets
econ = df.drop(['Units']).astype(float)[econ_keys]
travel = df.drop(['Units']).astype(float)[travel_keys]

cca = CCA(n_components=5)
cca.fit(travel, econ)

U = pd.DataFrame(cca.x_rotations_)
U.rename(index={0:"Air v-mi",1:"Highway v-mi",2:"Transit v-mi",3:"Rail t-mi",4:"Air Arrivals",5:"Air Departures"}, inplace=True)
print("Travel Canonical Components:")
print(U)

V = pd.DataFrame(cca.y_rotations_)
V.rename(index={0:"RPCE Goods",1:"RPCE Services",2:"RGDP",3:"Income",4:"Disp. Income",5:"Unemployment"}, inplace=True)
print("\nEconomic Canonical Components:")
print(V)

print('\n')
travel_c, econ_c = cca.fit_transform(travel, econ)

print("First Canonical Correlation =", np.corrcoef(travel_c[:,0], econ_c[:,0])[0,1])
print("Second Canonical Correlation =", np.corrcoef(travel_c[:,1], econ_c[:,1])[0,1])
print("Third Canonical Correlation =", np.corrcoef(travel_c[:,2], econ_c[:,2])[0,1])
print("Fourth Canonical Correlation =", np.corrcoef(travel_c[:,3], econ_c[:,3])[0,1])
print("Fifth Canonical Correlation =", np.corrcoef(travel_c[:,2], econ_c[:,4])[0,1])

Travel Canonical Components:
                       0         1         2         3         4
Air v-mi        0.038582 -0.271694  0.608069  0.541007 -0.800066
Highway v-mi    0.934561 -0.574290 -0.640421 -1.354686  0.627722
Transit v-mi    0.155858  0.198246 -0.552959  0.524638 -0.024256
Rail t-mi       0.065101  0.006902  0.049004  0.225767  0.204007
Air Arrivals    0.292080  0.793420  0.549590  0.011254 -0.827718
Air Departures  0.106143 -0.126657  0.014686  0.147739  0.842217

Economic Canonical Components:
                      0         1         2         3         4
RPCE Goods    -0.237604 -0.122690  0.002225  1.737066 -2.158120
RPCE Services  0.932752 -0.348136 -1.109130 -7.834337 -4.880769
RGDP           0.056376 -0.482085  1.101273  7.697494  6.532923
Income        -0.260555  0.982182 -0.001795 -1.466189  0.531744
Disp. Income   0.002287 -0.003257  0.009979 -0.133741  0.043234
Unemployment  -0.049461 -0.019655 -0.025593  0.425948  0.173075


First Canonical Correlation = 0.9989758578482323
Second Canonical Correlation = 0.9782451876298951
Third Canonical Correlation = 0.8213386352200954
Fourth Canonical Correlation = 0.6902648497373676
Fifth Canonical Correlation = -1.3589100461251289e-15

The goal of this canonical correlation analysis (CCA) was to determine how our economic measures and our travel data relate by projecting them as input and output data onto single dimensions with maximal correlation. The particular data components were chosen to gain insight on a wide array of macroeconomic health indicators and different types of travel. Our selection of data components was due to the lack of historical precedence of a pandemic as devastating as Covid-19 since 1975. With major institutions halted and data collection a low priority, available and reliable private and public travel data is limited. However, as travel data is a strong indicator of overall economic health and as we have, we decided to focus on the impacts of the travel ban on economic health through CCA analysis of our chosen data components.

While CCA is often used for dimensionality reduction as a means of enhancing the training of machine learning models, valuable insights can also be gained by examining the canonical correlations and their components. The first three travel canonical components are very strongly correlated with their respective economic canonical components, so we can say with relative confidence that the travel data is indeed correlated with the economic data, although we may be overfitting our model to achieve such a high R-squared. The relative personal consumption expenditure of goods and services and real GDP are highly correlated with changes in travel data, as expected of statistics that are understood to be economic health indicators. Then, looking at the makeup of individual canonical components in each canonical correlation, we can gain insights into relationships between input and output features in context. For example, in the first canonical correlation, the travel canonical component has a high weight on Highway v-mi compared to the weights representing the other travel features. Likewise, the first canonical correlation's economic canonical components are dominated by the weight of RPCE Services. Given these facts together, they could imply a strong relationship between these features, given their surrounding context. Indeed, if we look back at our pairplot, we see that those two features do correlate very strongly.

We had expected a strong correlation between macroeconomic factors and travel data for Real GDP, Personal Income, RPCE Goods, and RPCE Services as it would make sense for growth in the travel industry to grow in conjunction with these factors. The real GDP numbers show a steady trend of economic growth in the US while there has also been a steady increase and growth of the travel industry and tourism throughout the decades since 1975, with a few exceptions such as during 9/11. An increase in personal income and consumption of goods and services also correlates with increasing travel as more personal income provides more means for leisure activities such as travel. Businesses can also support more travel expenses as they are making higher revenue and profits. Higher consumption of goods and services also indicate a higher capacity for travel whether cross-country or internationally. We were surprised by the results of disposable personal income and unemployment. Data on unemployment has an extremely low R-value; this reflects a low explanatory connection between unemployment and travel data, although this does not necessarily mean our model had poor predictions. Even during periods of low disposable income and high unemployment, travel remained at a steady growth. This could indicate travel as a necessity even during periods when consumers have low disposable income or are experiencing unemployment. This makes logical sense for vehicle and train mileage as those are necessary to travel to essential places such as work and other businesses. The growth in US population since 1975 and the growth in the travel industry due to globalization and business expansion can also help account for this, because there is a higher need and established infrastructure to support public transportation despite high unemployment and low disposable income. Breakdown on data for travel due to leisure vs. travel due to business would provide further insight and a more interesting analysis.

Follow this link for phase 2 of this project: https://observablehq.com/@limyifan1/international-flight-map

Sources¶

This canonical correlation analysis is based on code by Professor Tom Fletcher, distributed to his Foundations of Data Analysis class at UVA. The aforementioned code can be found here: https://tomfletcher.github.io/FoDA/examples/CCA.ipynb

Data sourced from:

https://www.bts.dot.gov/product/national-transportation-statistics

https://data.bls.gov/timeseries/LNS14000000

https://www.bls.gov/images/bls_emblem.png

https://transtats.bts.gov/ONTIME/Departures.aspxhttps://www.bts.gov/content/us-vehicle-mileshttps://apps.bea.gov/iTable/iTable.cfm?reqid=19&step=2#reqid=19&step=2&isuri=1&1921=survey

Feature	Real PCE for Goods	Real PCE for Services	Real GDP	Personal income	Disposable personal income	Annual Mean Unemployment Rate	Air carrier domestic all services vehicle-miles	Highway vehicle-miles	Transite vehicle-miles	Rail train-miles	Air-travel arrivals in the USA	Air-travel departures from the USA
Units	[Index numbers, 2012=100]	[Index numbers, 2012=100]	[Billions of chained (2012) dollars]	[Billions of dollars]	[Billions of chained (2012) dollars]	%	Millions of Vehicle Miles	Millions of Vehicle Miles	Millions of Vehicle Miles	Millions of Train Miles	Thousands of Passengers	Thousands of Passengers
1975	30.905	33.884	5644.8	1369.4	2.5	8.475	1637.58	1.32766e+06	2176.2	432.557	12646	12053
1980	35.578	40.312	6759.2	2323.6	0.7	7.175	2276.02	1.5273e+06	2286.8	458.498	20262	19256
1985	43.508	47.953	7951.1	3524.9	3	7.19167	3025.74	1.77483e+06	2790.7	377.292	24155	22487
1990	50.039	57.345	9365.5	4913.8	2	5.61667	3963.27	2.14436e+06	3241.6	412.582	42712.2	42182.8
1991	49.046	58.242	9355.4	5084.9	0.7	6.85	3854.41	2.17205e+06	3306.3	408.974	41236	40819.5
1992	50.591	60.571	9684.9	5420.9	4.2	7.49167	3995.09	2.24715e+06	3354.5	424.241	45084.5	44799.6
1993	52.693	62.444	9951.5	5657.9	1.7	6.90833	4156.4	2.29638e+06	3434.9	440.446	47353	46935.8
1994	55.472	64.356	10352.4	5947.1	2.7	6.1	4377.89	2.35759e+06	3467.5	474.896	49558.3	49046.4
1995	57.129	66.241	10630.3	6291.4	3.3	5.59167	4627.81	2.4227e+06	3550	490.271	52990.8	52582.1
1996	59.687	68.157	11031.4	6678.5	3.2	5.40833	4807.14	2.48585e+06	3081.56	498.792	56820.4	56499.4
1997	62.524	70.34	11521.9	7092.5	3.7	4.94167	4907.13	2.5617e+06	3200.92	506.954	60315.3	60168.2
1998	66.703	73.536	12038.3	7606.7	5.9	4.5	5029.72	2.63152e+06	3346.95	507.947	62855.9	62578.2
1999	71.976	76.332	12610.5	8001.9	3.3	4.21667	5326.2	2.69134e+06	3499.71	524.442	66754.5	66454.4
2000	75.702	80.169	13131	8652.6	5	3.96667	5662.23	2.74692e+06	3604.54	539.001	72204.9	71546.2
2001	77.995	81.959	13262.1	9005.6	2.7	4.74167	5544.72	2.79561e+06	3735.4	535.546	65460.4	65154.2
2002	81.029	83.472	13493.1	9159	3	5.78333	5612.65	2.85551e+06	3854.6	537.292	62851.2	62611.7
2003	84.997	85.349	13879.1	9487.5	2.7	5.99167	6105.74	2.89022e+06	3914.77	553.458	62959.9	62848.2
2004	89.342	87.924	14406.4	10035.1	3.3	5.54167	6602.06	2.96479e+06	3971.62	571.855	70848.6	70521.2
2005	93.044	90.778	14912.5	10598.2	1.6	5.08333	6716.47	2.98943e+06	4054.26	583.765	75121.3	74985.9
2006	96.48	93.25	15338.3	11381.7	4	4.60833	6605.6	3.01437e+06	4126.84	598.69	77699	77317.8
2007	99.172	95.04	15626	12007.8	2.3	4.61667	6732.53	3.03112e+06	4237.74	580.959	80650.3	80271.6
2008	96.184	96.253	15604.7	12442.2	1	5.8	6446	2.97653e+06	4375.21	561.959	80547.9	80426.3
2009	93.184	95.943	15208.8	12059.1	-0.2	9.28333	5935.27	2.95676e+06	4474.55	474.535	75809.3	75726.9
2010	95.821	97.127	15598.8	12551.6	2	9.60833	5975.78	2.96727e+06	4400.18	513.359	80286.9	79878.8
2011	97.913	98.82	15840.7	13326.8	2.3	8.93333	6004.58	2.9504e+06	4331.45	530.401	83341.2	83120.6
2012	100	100	16197	14010.1	3.3	8.075	5956.17	2.96943e+06	4346.57	537.788	87108.5	86660.7
2013	103.147	100.63	16495.4	14181.1	-1.3	7.35833	5964.98	2.98828e+06	4413.34	542.394	91549.9	90815.3
2014	107.351	103.065	16912	14991.7	4.1	6.15833	5947.35	3.02566e+06	4429.08	556.18	96224.6	95525.3
2015	112.393	106.37	17403.8	15717.8	4.1	5.275	6045.82	3.09537e+06	4495.31	532.388	102431	101560
2016	116.464	108.848	17688.9	16121.2	1.8	4.875	6227.35	3.17441e+06	4545.01	490.654	108137	107293
2017	121.048	111.034	18108.1	16878.8	2.9	4.34167	6337.82	3.21235e+06	4574.24	503.111	113819	113157
2018	125.993	113.829	18638.2	17819.2	4	3.89167	6609.01	3.24033e+06	4591.94	514.347	119791	99483.9

Feature	Air carrier domestic all services vehicle-miles	Highway vehicle-miles	Transite vehicle-miles	Rail train-miles	Air-travel arrivals in the USA	Air-travel departures from the USA
Feature
Real PCE for Goods	0.898137	0.935567	0.948564	0.682776	0.973161	0.968711
Real PCE for Services	0.927224	0.967045	0.971635	0.692544	0.976104	0.979044
Real GDP	0.927311	0.964789	0.964390	0.704482	0.978296	0.980182
Personal income	0.855910	0.913809	0.948694	0.612700	0.982283	0.975725
Disposable personal income	0.078395	0.091216	-0.076769	0.155578	0.091412	0.072013
Annual Mean Unemployment Rate	-0.340533	-0.314819	-0.097174	-0.367904	-0.309863	-0.287489