EXTRACT . TRANSFORM . LOAD


The project scraps data from Yahoo Finance for Tesla stock price and Twitter for tweets by Tesla CEO Elon Musk. The data is further cleaned and merged on dates to see if Elons Musk tweets have any effect on stock price.


LEARN MORE

latitiude vs max temperature graph

THE TEAM

  1. Sarah Parzyck
  2. Odille Rodrigues
  3. Umar Farooq
  4. Ayan Guha
  5. Ben Pollock
  6. Mirgadir Alakbarov


EXTRACT



The Six Muskatweeters extracted daily Tesla stock (NASDAQ: TSLA) data from Yahoo Finance and combined that data with tweets from Elon Musk’s Twitter profile (@elonmusk).


LEARN MORE

EXTRACT


Yahoo Finance and Twitter required custom BeautifulSoup/Splinter web scrapers. After extracting stock and social media data for the past 5 years (7-28-2015 to 7-28-2020), we were able to begin our data transformation. Following are the top five rows of raw tables we got after successful extraction.


Tesla Stock Raw Table
Date Open High Low Close Adj Close Volume
0 Jul 27, 2020 1,435.00 1,547.94 1,413.00 1,539.60 1,539.60 16,048,700
1 Jul 24, 2020 1,416.01 1,465.00 1,366.54 1,417.00 1,417.00 19,396,600
2 Jul 23, 2020 1,678.95 1,689.00 1,480.77 1,513.07 1,513.07 24,328,500
3 Jul 22, 2020 1,599.00 1,626.42 1,562.00 1,592.33 1,592.33 14,161,100
4 Jul 21, 2020 1,639.93 1,675.00 1,558.00 1,568.36 1,568.36 16,157,300
Tweets Raw Table
id user text date retweets favorites
0 1287839317537284096 elonmusk Whether commercial or government, monopolies are bad for the people 2020-07-27 19:56:34+00:00 1848 13043
1 1287829529721962499 elonmusk Sounds great 2020-07-27 19:17:41+00:00 135 4722
2 1287828284852535301 elonmusk Haha 2020-07-27 19:12:44+00:00 16 1101
3 1287824348460593154 elonmusk Das Kapital in a nutshell 2020-07-27 18:57:05+00:00 43854 258196
4 1287822920966905858 elonmusk What I like most is their great sense of humor 2020-07-27 18:51:25+00:00 1239 32642


TRANSFORM



After importing our stock and tweet data into Jupyter notebooks, we began analyzing and cleansing the data for eventual loading into PostgreSQL.


LEARN MORE

TRANSFORM


We began by loading the stock and tweet data into Pandas Dataframes and excluding extraneous columns from the data. For the purposes of the project, the we required only the Date, High, Low, Open and Close price from the Tesla stock data, together with the Date and Tweet fields from the Twitter data. We performed a str.contains(“esla”) to limit our dataset to Tweets specifically related to Tesla.

To calculate fluctuations in stock price, we had to strip out non-numeric characters (i.e. “,”). We then cast the High/Low/Open/Close columns as floats and created new columns to store the calculated values. Differences between open and close price were stored in Open_Close, while differences in high and low price were stored in High_Low.

In order to merge the stock and tweet dataframes, we standardized the date fields in each dataframe by casting them each as datetime64[ns]. Upon merging the dataframes, we were ready to load our data into PostgreSQL.

Following table represent the top five rows of our transformed data set.


Final Data Table
Date Close Open High Low Open_Close High_Low Tweet
0 2020-07-24 1417.00 1416.01 1465.0 1366.54 0.99 98.46 Tesla solar panels are $1.49/Watt
1 2020-07-24 1417.00 1416.01 1465.0 1366.54 0.99 98.46 Thank you on behalf of the Tesla team. We look forward to building Giga Texas!
2 2020-07-23 1513.07 1678.95 1689.0 1480.77 -165.88 208.23 We are actually looking for revolutionary actuaries for Tesla Insurance! Please inquire, if interested.
3 2020-07-15 1546.01 1543.00 1550.0 1457.00 3.01 93.00 Great work by Tesla software engineering
4 2020-07-14 1516.80 1556.00 1590.0 1431.00 -39.20 159.00 Tesla Autopilot was literally named after the term used in aviation. Also, what about Autobahn!?


LOAD



Due to the highly structured nature of the data, we decided to load our data into a relational, PostgreSQL database.


LEARN MORE

LOAD


To prepare our dataframe for loading, we first converted it to CSV using the Pandas to_csv command. After that we connected to our local database, created a new database called tesla_final_db, and loaded the data into a new table called tesla using the Pandas to_sql command. Following image shows the result of a succesful connection with PostgreSQL.


latitiude vs max temperature graph