financial-data-science icon indicating copy to clipboard operation
financial-data-science copied to clipboard

Financial and Investment Data Science: FinDS Python library and examples for applying quantitative and machine learning methods on structured and unstructured financial data sets

FINANCIAL DATA SCIENCE

FinDS Python package for financial data science projects

  • use database engines SQL, MongoDB, Redis
  • interfaces for
    • structured data from CRSP, Compustat, IBES, TAQ
    • APIs from ALFRED, BEA
    • unstructured data from SEC Edgar, Federal Reserve websites
    • academic websites by Ken French, Loughran and MacDonald, Hoberg and Phillips
  • recipes for econometrics, finance, graphs, event studies, backtesting
  • applications of statistics, machine learning, neural networks and large language models

Resources

  1. Online Jupyter-book, or download pdf

  2. FinDS API reference

  3. FinDS repo

  4. Jupyter notebooks repo

Examples

notebook Financial Data Science
stock_prices Stock distributions, delistings CRSP stocks Statistical moments
jegadeesh_titman Overlapping portfolios;
Momentum effect
CRSP stocks Hypothesis testing;
Newey-West estimator
fama_french Portfolio sorts;
Value effect
CRSP stocks;
Compustat
Linear regression;
fama_macbeth Cross-sectional Regressions;
CAPM
Ken French research library Non-linear regression;
Quadratic optimization
weekly_reversals Mean reversion;
Implementation shortfall
CRSP stocks Structural breaks;
Performance evaluation
quant_factors Factor investing;
Backtests
CRSP stocks;
Compustat; IBES
Cluster analysis
event_study Event studies S&P key developments Multiple testing;
FFT
economic_releases Economic data revisions;
Employment payrolls
ALFRED Outliers
regression_diagnostics Consumer and
producer prices
FRED Linear regression diagnostics;
Residual analysis
econometric_forecast Production and Inflation FRED Time series analysis
approximate_factors Approximate factor models FRED-MD Unit root test
economic_states State space models FRED-MD Gaussian Mixture;
HMM
term_structure Interest rates FRED yield curve SVD
bond_returns Bond risk factors FRED bond returns PCA
option_pricing Binomial tree;
Black-Scholes-Merton and the Greeks
Simulations Monte Carlo simulation
conditional_volatility Value at risk FRED crypto-currencies EWMA; GARCH
covariance_matrix Portfolio risk Fama-French industries Covariance matrix estimation
market_microstructure Market impact;
Liquidity risk
TAQ tick data High frequency volatility
event_risk Earnings misses IBES Poisson regression;
GLM
customer_ego Supply chain Compustat principal customers Graph networks
industry_community Industry sectors Hoberg and Phillips
research library
Community detection
bea_centrality Input-output tables Bureau of Economic Analysis Graph centrality
link_prediction Product markets Hoberg and Phillips Link prediction
spatial_regression Earnings surprises IBES
Hoberg and Phillips
Spatial regression
fomc_topics FOMC meetings Federal Reserve website Topic models
mda_sentiment 10-K filings SEC Edgar;
Loughran and Macdonald
research library
Sentiment analysis
business_description 10-K filings SEC Edgar POS tagging;
Density-based clustering
classification_models Industry classification SEC Edgar Classification
regression_models Macroeconomic forecasts FRED-MD Regression
deep_classifier Industry classification SEC Edgar Neural networks;
Word embeddings
recurrent_net Macroeconomic models FRED-MD RNN;
Dynamic factor models
convolutional_net Macroeconomic forecasts FRED-MD CNN;
Vector autoregression
reinforcement_learning Spending policy SBBI Reinforcement learning
fomc_language Fedspeak FOMC meetings minutes Language modelling;
Transformers
sentiment_llm Financial news sentiment Kaggle LLM prompt engineering
summarization_llm 10-K filings SEC Edgar LLM text summarization
finetune_llm Industry classification SEC Edgar LLM fine-tuning
rag_agent Corporate philanthropy textbooks LLM RAG,
chatbots, agents

Contact

Github: https://terence-lim.github.io