data-science-notebooks
data-science-notebooks copied to clipboard
Jupyter notebooks, accompanying the FinDS Python repo: contains code examples and results for 30+ financial data science projects
FINANCIAL DATA SCIENCE
Financial Data Science projects in Jupyter notebooks, with FinDS Python package:
- use database engines SQL, Redis, MongoDB
- interfaces for
- structured data from CRSP, Compustat, IBES, TAQ
- APIs from ALFRED, BEA
- unstructured data from SEC Edgar, Federal Reserve websites
- academic websites by Fama and French, Loughran and MacDonald, Hoberg and Phillips
- recipes for econometrics, finance, graphs, event studies, backtesting
- applications of statistics, machine learning, NLP, neural networks and LLMs.
Topics
| notebook | Financial | Data | Science |
|---|---|---|---|
| stock_prices | Stock distributions, delistings | CRSP stocks | Statistical moments |
| jegadeesh_titman | Overlapping portfolios; Momentum effect |
CRSP stocks | Hypothesis testing; Newey-West estimator |
| fama_french | Portfolio sorts; Value effect |
CRSP stocks; Compustat |
Linear regression |
| fama_macbeth | Cross-sectional Regressions; CAPM |
Ken French research library | Non-linear regression; Quadratic optimization |
| weekly_reversals | Mean reversion; Implementation shortfall |
CRSP stocks | Structural breaks; Performance evaluation |
| quant_factors | Factor investing; Backtests |
CRSP stocks; Compustat; IBES |
Cluster analysis |
| event_study | Event studies | S&P key developments | Multiple testing; FFT |
| economic_releases | Economic data revisions; Employment payrolls |
ALFRED | Outliers |
| regression_diagnostics | Consumer and producer prices |
FRED | Linear regression diagnostics; Residual analysis |
| econometric_forecast | Production and Inflation | FRED | Time series analysis |
| approximate_factors | Approximate factor models | FRED-MD | Unit root test |
| economic_states | State space models | FRED-MD | Gaussian Mixture; HMM |
| term_structure | Interest rates | FRED yield curve | SVD |
| bond_returns | Bond risk factors | FRED bond returns | PCA |
| option_pricing | Binomial tree; Black-Scholes-Merton and the Greeks |
simulated data | Monte Carlo simulation |
| conditional_volatility | Value at risk | FRED crypto-currencies | EWMA; GARCH |
| covariance_matrix | Portfolio risk | Fama-French industries | Covariance matrix estimation |
| market_microstructure | Market impact; Liquidity risk |
TAQ tick data | High frequency volatility |
| event_risk | Earnings misses | IBES | Poisson regression; GLM |
| customer_ego | Supply chain | Compustat principal customers | Graph networks |
| industry_community | Industry sectors | Hoberg and Phillips research library |
Community detection |
| bea_centrality | Input-output tables | Bureau of Economic Analysis | Graph centrality |
| link_prediction | Product markets | Hoberg and Phillips | Link prediction |
| spatial_regression | Earnings surprises | IBES Hoberg and Phillips |
Spatial regression |
| fomc_topics | FOMC meetings | Federal Reserve website | Topic modeling |
| mda_sentiment | 10-K Management Discussion | SEC Edgar; Loughran and Macdonald research library |
Sentiment analysis |
| business_description | 10-K Business Description | SEC Edgar | POS tagging; Density-based clustering |
| classification_models | Industry classification | SEC Edgar | Classification |
| regression_models | Macroeconomic forecasts | FRED-MD | Regression |
| deep_classifier | Industry classification | SEC Edgar | Neural networks; Word embeddings |
| recurrent_net | Macroeconomic forecasts | FRED-MD | Recurrent Neural Nets; Dynamic factor models |
| convolutional_net | Macroeconomic forecasts | FRED-MD | Convolutional Neural Nets; Vector autoregression |
| reinforcement_learning | Retirement spending | SBBI | Reinforcement learning |
| fomc_language | Fedspeak | FOMC meetings minutes | Language modelling; Transformers |
| sentiment_llm | Financial news sentiment | Kaggle | LLM prompting |
| summarization_llm | 10-K Market Risks | SEC Edgar | Text summarization |
| finetune_llm | Industry classification | SEC Edgar | LLM fine-tuning |
| rag_agent | Corporate philanthropy | text documents | RAG, LLM chatbots and agents |
Resources
Contact
Github: https://terence-lim.github.io