Stephan Akkerman
Stephan Akkerman
Maybe we can create a model like FinTwitBERT but specialized in Reddit posts
Otherwise, we could use a free sentiment model (BORING!)
Relevant datasets: - https://huggingface.co/datasets/winddude/reddit_finance_43_250k - https://figshare.com/articles/dataset/Wallstreetbets_Reddit_Data_10_2020_-_04_2022_/22010699 - https://www.kaggle.com/datasets/aaaaaaaaade/reddit-wallstreetbets-hype-stock-posts?select=RedditAllBigClean+%282%29.csv - https://www.kaggle.com/datasets/thedevastator/unlocking-financial-opportunities-through-crypto - https://www.kaggle.com/datasets/leukipp/reddit-finance-data - https://data.mendeley.com/datasets/b6ns6d8xv3/1 - https://www.kaggle.com/datasets/wordsforthewise/wallstreetbets-subreddit-data
Earnings are currently broken because of yahoo_fin not keeping up with changes.
https://github.com/wenboyu2/yahoo-earnings-calendar/pull/35/files
Maybe change our own implementation of utils/earnings_scraper.py to fix it
implementation + yahoo-earnings-calendar does not work
Could scrape https://api.nasdaq.com/api/calendar/upcoming / https://api.nasdaq.com/api/calendar/earnings?date=2024-05-20 instead
- Save all historical Bitcoin price data (daily?) - Use binance API to full up missing dates - ??? - Chart
Maybe also be a good idea to release this separately together with #456