cryptocurrency-ticks-data
cryptocurrency-ticks-data copied to clipboard
Add cryptocurrency tick data reader with pandas DataFrame support
Add cryptocurrency tick data reader with pandas DataFrame support
Summary
This PR introduces a comprehensive Python module for loading and analyzing cryptocurrency tick data from the repository's ZIP archives. The implementation includes:
Core Features:
-
TickerDataLoaderclass for loading tick data with optional date range filtering - Automatic pandas DataFrame conversion with proper timestamp handling
- Support for all 7 available tickers (BCCUSDT, BNBUSDT, BTCUSDT, ETHBTC, LTCBTC, NEOUSDT, QTUMUSDT)
- Utility functions for quick data access (
list_available_tickers,quick_load) - Comprehensive error handling for missing/corrupted files
Files Added:
-
reading-src/read_ticker.py- Main module (292 lines) -
reading-src/__init__.py- Package initialization -
tests/test_read_ticker.py- Test suite with 21 test cases covering all functionality
Data Processing:
- Converts epoch timestamps to pandas datetime objects
- Handles ZIP file extraction and CSV parsing automatically
- Supports flexible date range queries (YYYY-MM-DD format)
- Combines multiple daily files into sorted DataFrames
Review & Testing Checklist for Human
- [ ] Test with real cryptocurrency data files - Verify the module works correctly with actual BTCUSDT, ETHBTC data files from the repository (not just mock test data)
- [ ] Validate date range filtering edge cases - Test boundary conditions like single-day ranges, missing dates, and date ranges that span missing files
- [ ] Check memory usage with larger datasets - Load a week or month of BTCUSDT data to ensure memory handling is reasonable for typical use cases
- [ ] Verify timestamp conversion accuracy - Compare a few raw epoch timestamps with the converted datetime values to ensure timezone and precision are correct
- [ ] Test error handling with real edge cases - Try loading invalid tickers, corrupted ZIP files, or date ranges outside available data
Recommended Test Plan:
# Test basic functionality
from reading_src import TickerDataLoader
loader = TickerDataLoader()
btc_data = loader.load_ticker_data("BTCUSDT", "2018-04-07", "2018-04-07")
print(f"Loaded {len(btc_data)} trades, price range: ${btc_data['Price'].min():.2f}-${btc_data['Price'].max():.2f}")
# Test date range and memory usage
eth_week = loader.load_ticker_data("ETHBTC", "2018-04-07", "2018-04-14")
print(f"Week of data: {len(eth_week)} trades, memory usage: {eth_week.memory_usage().sum() / 1024**2:.1f} MB")
Diagram
%%{ init : { "theme" : "default" }}%%
graph TD
DataDir["data/<br/>BTCUSDT/<br/>ETHBTC/<br/>etc."]:::context
ZipFiles["*.csv.zip<br/>daily files"]:::context
ReadTicker["reading-src/<br/>read_ticker.py"]:::major-edit
InitFile["reading-src/<br/>__init__.py"]:::major-edit
TestFile["tests/<br/>test_read_ticker.py"]:::major-edit
TickerLoader["TickerDataLoader<br/>class"]:::major-edit
UtilFuncs["list_available_tickers()<br/>quick_load()"]:::major-edit
PandasDF["pandas<br/>DataFrame"]:::context
DataDir --> ZipFiles
ZipFiles --> ReadTicker
ReadTicker --> TickerLoader
ReadTicker --> UtilFuncs
TickerLoader --> PandasDF
TestFile --> ReadTicker
InitFile --> ReadTicker
subgraph Legend
L1["Major Edit"]:::major-edit
L2["Minor Edit"]:::minor-edit
L3["Context/No Edit"]:::context
end
classDef major-edit fill:#90EE90
classDef minor-edit fill:#87CEEB
classDef context fill:#FFFFFF
Notes
- All 21 pytest tests pass successfully, covering unit tests, integration tests, and error handling scenarios
- The module handles the existing data structure (CSV files in ZIP archives organized by ticker/date)
- Date range: 2018-04-06 to 2019-11-18 with 590+ days of tick data
- File renamed from
read-ticker.pytoread_ticker.pyto follow Python naming conventions - Real data integration tests are included but limited - human testing with actual large datasets is recommended
Session Info:
- Requested by: Eli Belash (@Nucs)
- Devin Session: https://app.devin.ai/sessions/a3e57dcc2d7241f480a8407979403e93
🤖 Devin AI Engineer
I'll be helping with this pull request! Here's what you should know:
✅ I will automatically:
- Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
- Look at CI failures and help fix them
Note: I can only respond to comments from users who have write access to this repository.
⚙️ Control Options:
- [ ] Disable automatic comment and CI monitoring
Exclude all pycache via gitignore file