New Feature: Implement Covariance Indicator #6982
Description
Implemented the Covariance indicator as requested in issue #6982.
This indicator computes the covariance between two data series (target and reference) over a specified period using MathNet.Numerics.Statistics.Covariance.
Related Issue
Closes #6982
Motivation and Context
Covariance is a fundamental statistical measure used in finance, particularly for portfolio optimization and risk management. Adding this indicator allows users to easily calculate the joint variability of two assets within the Lean engine.
Requires Documentation Change
No. (Standard indicator addition)
How Has This Been Tested?
- Created a new test class
CovarianceTests.cs. - Verified mathematical accuracy by comparing the indicator output against manual calculation.
- Verified standard indicator behaviors:
IsReady,Reset, andWarmUp. - Ran unit tests locally using
dotnet test.
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] Refactor (non-breaking change which improves implementation)
- [ ] Performance (non-breaking change which improves performance. Please add associated performance test and results)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Non-functional change (xml comments/documentation/etc)
Checklist:
- [x] My code follows the code style of this project.
- [x] I have read the CONTRIBUTING document.
- [x] I have added tests to cover my changes.
- [x] All new and existing tests passed.
- [x] My branch follows the naming convention
bug-<issue#>-<description>orfeature-<issue#>-<description>
Hi team! 👋 This is my first attempt at implementing a new Indicator for LEAN. I modeled the implementation and tests after the existing Correlation indicator logic. I have verified it locally, but as I am new to the codebase, I would really appreciate any feedback or suggestions to ensure I am following the best practices. Thanks!
Hey @yyxxddjj! Welcome to Lean! Please take a look at the issue and related PRs which tried to implement this before, to understand what's expected, you can also take a look at merged indicator PRs there are plenty. Quick review, there are a few things missing here, comparing with test data in related issue, helper method in qcalgorithm indicator.
Hi @Martin-Molinero, Thank you so much for the review and guidance! I have updated the PR with the following changes: Helper Method: Added the Covariance(...) helper in QCAlgorithm.Indicators.cs. Testing: Included the spy_qqq_cov.csv data and updated CovarianceTests to verify the implementation against it. Config: Updated QuantConnect.Tests.csproj to ensure the test data is copied correctly during the build. Could you please take a look and let me know if this implementation meets the project standards? Thanks again for your time!
Hi @Martin-Molinero, thanks for catching that! Renaming: Renamed the helper method to COV. Tests: Removed Assert.Ignore, so all tests will run on CI. Data: You were right! I realized I had downloaded an older version of the CSV from the issue thread. I've now updated it with the latest version from LouisSzeto's comment (Mar 12, 2024) and verified the tests pass locally. Pushed the updates. Thanks!
Hey @yyxxddjj! Please re run the python commands Louis had shared 👍 I don't think current csv looks right. Also please use English comments to follow standard
Hi @Martin-Molinero, I have updated the comments to English. I also re-ran the Python commands provided by Louis. You were right, the previous data file was indeed incorrect. I have now pushed the updated CSV file with the correct values. Please let me know if everything looks good now. Thanks!
@yyxxddjj tests are failing 👀
@Martin-Molinero Thank you for the feedback! I have pushed a significant update based on your suggestions.
- Refactor to Standalone Tests I decided to decouple CovarianceTests from the CommonIndicatorTests base class.
Reasoning: The base class enforces strict checks for standard OHLC columns (e.g., throwing exceptions if 'open' is missing during Renko tests). Since Covariance is a dual-symbol indicator using specific verification data, fitting it into the base class's expected format required filling unrelated OHLC columns with arbitrary values, which felt semantically incorrect and introduced unnecessary complexity.
Solution: I implemented a dedicated, standalone test class. This allows for explicit, readable tests without working around the base class limitations.
- Data Generation & Structure The test data (spy_qqq_cov.csv) is generated using a Python script to ensure mathematical accuracy against the standard pandas.rolling(window).cov().
Source: It uses the raw price data from the LEAN data/ directory (SPY and QQQ).
CSV Structure: Date, SPY (Price), QQQ (Price), Covariance (Expected Result).
- Test Coverage Despite not inheriting from the base class, the new suite maintains rigorous coverage:
Accuracy: Validates calculations against the Python-generated control data (with robust handling for scientific notation).
Lifecycle: Fully tests IsReady logic, WarmUpPeriod, and Reset() behavior.
Dual-Stream: Explicitly verifies the indicator handles updates from two different symbols correctly.
For transparency, I have included the data generation script below:
Click to view Python Generation Script (generate_data.py)
Python import pandas as pd import zipfile
Configuration
SPY_PATH = 'data/spy.zip' QQQ_PATH = 'data/qqq.zip' OUTPUT_FILE = 'spy_qqq_cov.csv' WINDOW_SIZE = 252
def read_lean_zip(path, symbol): """Reads LEAN zip data and returns the Close price series.""" with zipfile.ZipFile(path, 'r') as z: filename = z.namelist()[0] with z.open(filename) as f: df = pd.read_csv(f, header=None, names=['Date', 'Open', 'High', 'Low', 'Close', 'Vol']) df['Date'] = pd.to_datetime(df['Date'], format='%Y%m%d %H:%M') return df.set_index('Date')['Close'].rename(symbol)
1. Load Data
spy = read_lean_zip(SPY_PATH, 'SPY') qqq = read_lean_zip(QQQ_PATH, 'QQQ')
2. Align and Calculate Covariance
data = pd.concat([spy, qqq], axis=1).dropna() returns = data.pct_change() cov_values = returns['SPY'].rolling(window=WINDOW_SIZE).cov(returns['QQQ'])
3. Export Clean Verification Data
output = pd.DataFrame({ 'Date': cov_values.index, 'SPY': data['SPY'], 'QQQ': data['QQQ'], 'Covariance': cov_values }).set_index('Date').dropna()
output.to_csv(OUTPUT_FILE, index=True, header=True) print(f"Generated {OUTPUT_FILE} with {len(output)} rows.")
Hey @yyxxddjj! Sorry but should revert changes in unrelated files like CorrelationPearsonTests.cs, also I believe should still be using CommonIndicatorTests if there's an improvement to be done there we can look into it, but we already have a few multi symbol indicators which use it as base, so it's just about following the pattern here.
data generation script => should be unless there's something wrong should be using the one in the related issue...
Hi @Martin-Molinero, thanks for the review. I have cleaned up the branch and addressed the issues you mentioned: Reverted unrelated changes: All changes to unrelated files (like CorrelationPearsonTests.cs) have been removed. Updated CSV Data: I have reformatted the test data CSV to ensure it aligns with the requirements. Refined Implementation: The Covariance indicator logic has been optimized. Could you please take a look and let me know if there are any other issues? Thanks!
Hi @Martin-Molinero,
Updated the implementation following your feedback - now using CommonIndicatorTests as base class with only 4 files changed.
Regarding CI: All 5 failing checks show the same . NET runtime mismatch error (9.0.0 vs 10.0.1). Syntax Tests passed and code compiles with 0 errors. Could you check the CI environment?
Thanks!
Hi @Martin-Molinero,
Thanks for the feedback! I've pushed updates addressing all the issues:
1. COV helper method ✅ Added COV() helper method to QCAlgorithm.Indicators.cs.
2. AcceptsVolumeRenkoBarsAsInput failing ✅
Fixed by using period = 5 instead of 252 in this test (matching the pattern in BetaIndicatorTests). The issue was that VolumeRenkoConsolidator(1000000000) couldn't produce enough bars from the CSV data to satisfy period = 252.
3. WarmUpPeriod
The calculation WarmUpPeriod += (period - 2) + 1 follows the Beta indicator exactly, as suggested in issue #6982 which states: "Lean currently implemented it internally in the Beta indicator". Let me know if a different behavior is expected.
Thanks!