The project scraps data from Yahoo Finance for Tesla stock price and Twitter for tweets by Tesla CEO Elon Musk. The data is further cleaned and merged on dates to see if Elons Musk tweets have any effect on stock price.
The Six Muskatweeters extracted daily Tesla stock (NASDAQ: TSLA) data from Yahoo Finance and combined that data with tweets from Elon Musk’s Twitter profile (@elonmusk).
Yahoo Finance and Twitter required custom BeautifulSoup/Splinter web scrapers. After extracting stock and social media data for the past 5 years (7-28-2015 to 7-28-2020), we were able to begin our data transformation. Following are the top five rows of raw tables we got after successful extraction.
Date | Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|---|
0 | Jul 27, 2020 | 1,435.00 | 1,547.94 | 1,413.00 | 1,539.60 | 1,539.60 | 16,048,700 |
1 | Jul 24, 2020 | 1,416.01 | 1,465.00 | 1,366.54 | 1,417.00 | 1,417.00 | 19,396,600 |
2 | Jul 23, 2020 | 1,678.95 | 1,689.00 | 1,480.77 | 1,513.07 | 1,513.07 | 24,328,500 |
3 | Jul 22, 2020 | 1,599.00 | 1,626.42 | 1,562.00 | 1,592.33 | 1,592.33 | 14,161,100 |
4 | Jul 21, 2020 | 1,639.93 | 1,675.00 | 1,558.00 | 1,568.36 | 1,568.36 | 16,157,300 |
id | user | text | date | retweets | favorites | |
---|---|---|---|---|---|---|
0 | 1287839317537284096 | elonmusk | Whether commercial or government, monopolies are bad for the people | 2020-07-27 19:56:34+00:00 | 1848 | 13043 |
1 | 1287829529721962499 | elonmusk | Sounds great | 2020-07-27 19:17:41+00:00 | 135 | 4722 |
2 | 1287828284852535301 | elonmusk | Haha | 2020-07-27 19:12:44+00:00 | 16 | 1101 |
3 | 1287824348460593154 | elonmusk | Das Kapital in a nutshell | 2020-07-27 18:57:05+00:00 | 43854 | 258196 |
4 | 1287822920966905858 | elonmusk | What I like most is their great sense of humor | 2020-07-27 18:51:25+00:00 | 1239 | 32642 |
After importing our stock and tweet data into Jupyter notebooks, we began analyzing and cleansing the data for eventual loading into PostgreSQL.
We began by loading the stock and tweet data into Pandas Dataframes and excluding extraneous columns from the data. For the purposes of the project, the we required only the Date, High, Low, Open and Close price from the Tesla stock data, together with the Date and Tweet fields from the Twitter data. We performed a str.contains(“esla”) to limit our dataset to Tweets specifically related to Tesla.
To calculate fluctuations in stock price, we had to strip out non-numeric characters (i.e. “,”). We then cast the High/Low/Open/Close columns as floats and created new columns to store the calculated values. Differences between open and close price were stored in Open_Close, while differences in high and low price were stored in High_Low.
In order to merge the stock and tweet dataframes, we standardized the date fields in each dataframe by casting them each as datetime64[ns]. Upon merging the dataframes, we were ready to load our data into PostgreSQL.
Following table represent the top five rows of our transformed data set.
Date | Close | Open | High | Low | Open_Close | High_Low | Tweet | |
---|---|---|---|---|---|---|---|---|
0 | 2020-07-24 | 1417.00 | 1416.01 | 1465.0 | 1366.54 | 0.99 | 98.46 | Tesla solar panels are $1.49/Watt |
1 | 2020-07-24 | 1417.00 | 1416.01 | 1465.0 | 1366.54 | 0.99 | 98.46 | Thank you on behalf of the Tesla team. We look forward to building Giga Texas! |
2 | 2020-07-23 | 1513.07 | 1678.95 | 1689.0 | 1480.77 | -165.88 | 208.23 | We are actually looking for revolutionary actuaries for Tesla Insurance! Please inquire, if interested. |
3 | 2020-07-15 | 1546.01 | 1543.00 | 1550.0 | 1457.00 | 3.01 | 93.00 | Great work by Tesla software engineering |
4 | 2020-07-14 | 1516.80 | 1556.00 | 1590.0 | 1431.00 | -39.20 | 159.00 | Tesla Autopilot was literally named after the term used in aviation. Also, what about Autobahn!? |
Due to the highly structured nature of the data, we decided to load our data into a relational, PostgreSQL database.
To prepare our dataframe for loading, we first converted it to CSV using the Pandas to_csv command. After that we connected to our local database, created a new database called tesla_final_db, and loaded the data into a new table called tesla using the Pandas to_sql command. Following image shows the result of a succesful connection with PostgreSQL.