Analyzing Economic Impacts of COVID-19 Through Travel Data

By Sharvari Bhatt, Lydia Kim, Yi Fan Lim, Saeed Razavi, Leon Smith

In [1]:
#Imports
import pandas as pd
import os
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline 
from sklearn.cross_decomposition import CCA
In [87]:
df = pd.read_excel('.\\Econ and CS Project Data.xlsx', sheet_name=0).T
#relative addressing. Keep the ipynb in the same file as the xlsx

header = df.iloc[0]
header.name = 'Feature'
df = df[1:]
df.columns = header 

df
Out[87]:
Feature Real PCE for Goods Real PCE for Services Real GDP Personal income Disposable personal income Annual Mean Unemployment Rate Air carrier domestic all services vehicle-miles Highway vehicle-miles Transite vehicle-miles Rail train-miles Air-travel arrivals in the USA Air-travel departures from the USA
Units [Index numbers, 2012=100] [Index numbers, 2012=100] [Billions of chained (2012) dollars] [Billions of dollars] [Billions of chained (2012) dollars] % Millions of Vehicle Miles Millions of Vehicle Miles Millions of Vehicle Miles Millions of Train Miles Thousands of Passengers Thousands of Passengers
1975 30.905 33.884 5644.8 1369.4 2.5 8.475 1637.58 1.32766e+06 2176.2 432.557 12646 12053
1980 35.578 40.312 6759.2 2323.6 0.7 7.175 2276.02 1.5273e+06 2286.8 458.498 20262 19256
1985 43.508 47.953 7951.1 3524.9 3 7.19167 3025.74 1.77483e+06 2790.7 377.292 24155 22487
1990 50.039 57.345 9365.5 4913.8 2 5.61667 3963.27 2.14436e+06 3241.6 412.582 42712.2 42182.8
1991 49.046 58.242 9355.4 5084.9 0.7 6.85 3854.41 2.17205e+06 3306.3 408.974 41236 40819.5
1992 50.591 60.571 9684.9 5420.9 4.2 7.49167 3995.09 2.24715e+06 3354.5 424.241 45084.5 44799.6
1993 52.693 62.444 9951.5 5657.9 1.7 6.90833 4156.4 2.29638e+06 3434.9 440.446 47353 46935.8
1994 55.472 64.356 10352.4 5947.1 2.7 6.1 4377.89 2.35759e+06 3467.5 474.896 49558.3 49046.4
1995 57.129 66.241 10630.3 6291.4 3.3 5.59167 4627.81 2.4227e+06 3550 490.271 52990.8 52582.1
1996 59.687 68.157 11031.4 6678.5 3.2 5.40833 4807.14 2.48585e+06 3081.56 498.792 56820.4 56499.4
1997 62.524 70.34 11521.9 7092.5 3.7 4.94167 4907.13 2.5617e+06 3200.92 506.954 60315.3 60168.2
1998 66.703 73.536 12038.3 7606.7 5.9 4.5 5029.72 2.63152e+06 3346.95 507.947 62855.9 62578.2
1999 71.976 76.332 12610.5 8001.9 3.3 4.21667 5326.2 2.69134e+06 3499.71 524.442 66754.5 66454.4
2000 75.702 80.169 13131 8652.6 5 3.96667 5662.23 2.74692e+06 3604.54 539.001 72204.9 71546.2
2001 77.995 81.959 13262.1 9005.6 2.7 4.74167 5544.72 2.79561e+06 3735.4 535.546 65460.4 65154.2
2002 81.029 83.472 13493.1 9159 3 5.78333 5612.65 2.85551e+06 3854.6 537.292 62851.2 62611.7
2003 84.997 85.349 13879.1 9487.5 2.7 5.99167 6105.74 2.89022e+06 3914.77 553.458 62959.9 62848.2
2004 89.342 87.924 14406.4 10035.1 3.3 5.54167 6602.06 2.96479e+06 3971.62 571.855 70848.6 70521.2
2005 93.044 90.778 14912.5 10598.2 1.6 5.08333 6716.47 2.98943e+06 4054.26 583.765 75121.3 74985.9
2006 96.48 93.25 15338.3 11381.7 4 4.60833 6605.6 3.01437e+06 4126.84 598.69 77699 77317.8
2007 99.172 95.04 15626 12007.8 2.3 4.61667 6732.53 3.03112e+06 4237.74 580.959 80650.3 80271.6
2008 96.184 96.253 15604.7 12442.2 1 5.8 6446 2.97653e+06 4375.21 561.959 80547.9 80426.3
2009 93.184 95.943 15208.8 12059.1 -0.2 9.28333 5935.27 2.95676e+06 4474.55 474.535 75809.3 75726.9
2010 95.821 97.127 15598.8 12551.6 2 9.60833 5975.78 2.96727e+06 4400.18 513.359 80286.9 79878.8
2011 97.913 98.82 15840.7 13326.8 2.3 8.93333 6004.58 2.9504e+06 4331.45 530.401 83341.2 83120.6
2012 100 100 16197 14010.1 3.3 8.075 5956.17 2.96943e+06 4346.57 537.788 87108.5 86660.7
2013 103.147 100.63 16495.4 14181.1 -1.3 7.35833 5964.98 2.98828e+06 4413.34 542.394 91549.9 90815.3
2014 107.351 103.065 16912 14991.7 4.1 6.15833 5947.35 3.02566e+06 4429.08 556.18 96224.6 95525.3
2015 112.393 106.37 17403.8 15717.8 4.1 5.275 6045.82 3.09537e+06 4495.31 532.388 102431 101560
2016 116.464 108.848 17688.9 16121.2 1.8 4.875 6227.35 3.17441e+06 4545.01 490.654 108137 107293
2017 121.048 111.034 18108.1 16878.8 2.9 4.34167 6337.82 3.21235e+06 4574.24 503.111 113819 113157
2018 125.993 113.829 18638.2 17819.2 4 3.89167 6609.01 3.24033e+06 4591.94 514.347 119791 99483.9
In [ ]:
econ_keys = ['Real PCE for Goods', 'Real PCE for Services', 'Real GDP','Personal income', 'Disposable personal income','Annual Mean Unemployment Rate']
travel_keys = ['Air carrier domestic all services vehicle-miles','Highway vehicle-miles', 'Transite vehicle-miles', 'Rail train-miles','Air-travel arrivals in the USA', 'Air-travel departures from the USA']

Both the truncated correlation matrix and the pair plot below show comparisons of our datasets with economic data on the vertical axis and travel data on the horizontal. The correlation matrix shows the correlation between any two features at the position where their row and column intersect. An intution for that correlation can be gleamed by looking at the same on the pair plot. This knowledge of individual correlations can help us later when explaining the results of our canonical correlation analysis.

In [94]:
df.drop(['Units']).astype(float).corr().head(6)[travel_keys]
#Correlation matrix shows correlation of each feature row by column
Out[94]:
Feature Air carrier domestic all services vehicle-miles Highway vehicle-miles Transite vehicle-miles Rail train-miles Air-travel arrivals in the USA Air-travel departures from the USA
Feature
Real PCE for Goods 0.898137 0.935567 0.948564 0.682776 0.973161 0.968711
Real PCE for Services 0.927224 0.967045 0.971635 0.692544 0.976104 0.979044
Real GDP 0.927311 0.964789 0.964390 0.704482 0.978296 0.980182
Personal income 0.855910 0.913809 0.948694 0.612700 0.982283 0.975725
Disposable personal income 0.078395 0.091216 -0.076769 0.155578 0.091412 0.072013
Annual Mean Unemployment Rate -0.340533 -0.314819 -0.097174 -0.367904 -0.309863 -0.287489
In [89]:
sns.set(style="darkgrid")
sns.set_palette("Dark2")

sns.pairplot(df.drop(['Units']),height=3, x_vars = travel_keys, y_vars = econ_keys)
plt.show()
In [103]:
#Define input and output datasets
econ = df.drop(['Units']).astype(float)[econ_keys]
travel = df.drop(['Units']).astype(float)[travel_keys]
In [108]:
cca = CCA(n_components=5)
cca.fit(travel, econ)

U = pd.DataFrame(cca.x_rotations_)
U.rename(index={0:"Air v-mi",1:"Highway v-mi",2:"Transit v-mi",3:"Rail t-mi",4:"Air Arrivals",5:"Air Departures"}, inplace=True)
print("Travel Canonical Components:")
print(U)

V = pd.DataFrame(cca.y_rotations_)
V.rename(index={0:"RPCE Goods",1:"RPCE Services",2:"RGDP",3:"Income",4:"Disp. Income",5:"Unemployment"}, inplace=True)
print("\nEconomic Canonical Components:")
print(V)

print('\n')
travel_c, econ_c = cca.fit_transform(travel, econ)

print("First Canonical Correlation =", np.corrcoef(travel_c[:,0], econ_c[:,0])[0,1])
print("Second Canonical Correlation =", np.corrcoef(travel_c[:,1], econ_c[:,1])[0,1])
print("Third Canonical Correlation =", np.corrcoef(travel_c[:,2], econ_c[:,2])[0,1])
print("Fourth Canonical Correlation =", np.corrcoef(travel_c[:,3], econ_c[:,3])[0,1])
print("Fifth Canonical Correlation =", np.corrcoef(travel_c[:,2], econ_c[:,4])[0,1])
Travel Canonical Components:
                       0         1         2         3         4
Air v-mi        0.038582 -0.271694  0.608069  0.541007 -0.800066
Highway v-mi    0.934561 -0.574290 -0.640421 -1.354686  0.627722
Transit v-mi    0.155858  0.198246 -0.552959  0.524638 -0.024256
Rail t-mi       0.065101  0.006902  0.049004  0.225767  0.204007
Air Arrivals    0.292080  0.793420  0.549590  0.011254 -0.827718
Air Departures  0.106143 -0.126657  0.014686  0.147739  0.842217

Economic Canonical Components:
                      0         1         2         3         4
RPCE Goods    -0.237604 -0.122690  0.002225  1.737066 -2.158120
RPCE Services  0.932752 -0.348136 -1.109130 -7.834337 -4.880769
RGDP           0.056376 -0.482085  1.101273  7.697494  6.532923
Income        -0.260555  0.982182 -0.001795 -1.466189  0.531744
Disp. Income   0.002287 -0.003257  0.009979 -0.133741  0.043234
Unemployment  -0.049461 -0.019655 -0.025593  0.425948  0.173075


First Canonical Correlation = 0.9989758578482323
Second Canonical Correlation = 0.9782451876298951
Third Canonical Correlation = 0.8213386352200954
Fourth Canonical Correlation = 0.6902648497373676
Fifth Canonical Correlation = -1.3589100461251289e-15

The goal of this canonical correlation analysis (CCA) was to determine how our economic measures and our travel data relate by projecting them as input and output data onto single dimensions with maximal correlation. The particular data components were chosen to gain insight on a wide array of macroeconomic health indicators and different types of travel. Our selection of data components was due to the lack of historical precedence of a pandemic as devastating as Covid-19 since 1975. With major institutions halted and data collection a low priority, available and reliable private and public travel data is limited. However, as travel data is a strong indicator of overall economic health and as we have, we decided to focus on the impacts of the travel ban on economic health through CCA analysis of our chosen data components.

While CCA is often used for dimensionality reduction as a means of enhancing the training of machine learning models, valuable insights can also be gained by examining the canonical correlations and their components. The first three travel canonical components are very strongly correlated with their respective economic canonical components, so we can say with relative confidence that the travel data is indeed correlated with the economic data, although we may be overfitting our model to achieve such a high R-squared. The relative personal consumption expenditure of goods and services and real GDP are highly correlated with changes in travel data, as expected of statistics that are understood to be economic health indicators. Then, looking at the makeup of individual canonical components in each canonical correlation, we can gain insights into relationships between input and output features in context. For example, in the first canonical correlation, the travel canonical component has a high weight on Highway v-mi compared to the weights representing the other travel features. Likewise, the first canonical correlation's economic canonical components are dominated by the weight of RPCE Services. Given these facts together, they could imply a strong relationship between these features, given their surrounding context. Indeed, if we look back at our pairplot, we see that those two features do correlate very strongly.

We had expected a strong correlation between macroeconomic factors and travel data for Real GDP, Personal Income, RPCE Goods, and RPCE Services as it would make sense for growth in the travel industry to grow in conjunction with these factors. The real GDP numbers show a steady trend of economic growth in the US while there has also been a steady increase and growth of the travel industry and tourism throughout the decades since 1975, with a few exceptions such as during 9/11. An increase in personal income and consumption of goods and services also correlates with increasing travel as more personal income provides more means for leisure activities such as travel. Businesses can also support more travel expenses as they are making higher revenue and profits. Higher consumption of goods and services also indicate a higher capacity for travel whether cross-country or internationally. We were surprised by the results of disposable personal income and unemployment. Data on unemployment has an extremely low R-value; this reflects a low explanatory connection between unemployment and travel data, although this does not necessarily mean our model had poor predictions. Even during periods of low disposable income and high unemployment, travel remained at a steady growth. This could indicate travel as a necessity even during periods when consumers have low disposable income or are experiencing unemployment. This makes logical sense for vehicle and train mileage as those are necessary to travel to essential places such as work and other businesses. The growth in US population since 1975 and the growth in the travel industry due to globalization and business expansion can also help account for this, because there is a higher need and established infrastructure to support public transportation despite high unemployment and low disposable income. Breakdown on data for travel due to leisure vs. travel due to business would provide further insight and a more interesting analysis.

Follow this link for phase 2 of this project: https://observablehq.com/@limyifan1/international-flight-map

Sources

This canonical correlation analysis is based on code by Professor Tom Fletcher, distributed to his Foundations of Data Analysis class at UVA. The aforementioned code can be found here: https://tomfletcher.github.io/FoDA/examples/CCA.ipynb