cryptocurrency-ticks-data icon indicating copy to clipboard operation
cryptocurrency-ticks-data copied to clipboard

Add cryptocurrency tick data reader with pandas DataFrame support

Open devin-ai-integration[bot] opened this issue 5 months ago • 2 comments

Add cryptocurrency tick data reader with pandas DataFrame support

Summary

This PR introduces a comprehensive Python module for loading and analyzing cryptocurrency tick data from the repository's ZIP archives. The implementation includes:

Core Features:

  • TickerDataLoader class for loading tick data with optional date range filtering
  • Automatic pandas DataFrame conversion with proper timestamp handling
  • Support for all 7 available tickers (BCCUSDT, BNBUSDT, BTCUSDT, ETHBTC, LTCBTC, NEOUSDT, QTUMUSDT)
  • Utility functions for quick data access (list_available_tickers, quick_load)
  • Comprehensive error handling for missing/corrupted files

Files Added:

  • reading-src/read_ticker.py - Main module (292 lines)
  • reading-src/__init__.py - Package initialization
  • tests/test_read_ticker.py - Test suite with 21 test cases covering all functionality

Data Processing:

  • Converts epoch timestamps to pandas datetime objects
  • Handles ZIP file extraction and CSV parsing automatically
  • Supports flexible date range queries (YYYY-MM-DD format)
  • Combines multiple daily files into sorted DataFrames

Review & Testing Checklist for Human

  • [ ] Test with real cryptocurrency data files - Verify the module works correctly with actual BTCUSDT, ETHBTC data files from the repository (not just mock test data)
  • [ ] Validate date range filtering edge cases - Test boundary conditions like single-day ranges, missing dates, and date ranges that span missing files
  • [ ] Check memory usage with larger datasets - Load a week or month of BTCUSDT data to ensure memory handling is reasonable for typical use cases
  • [ ] Verify timestamp conversion accuracy - Compare a few raw epoch timestamps with the converted datetime values to ensure timezone and precision are correct
  • [ ] Test error handling with real edge cases - Try loading invalid tickers, corrupted ZIP files, or date ranges outside available data

Recommended Test Plan:

# Test basic functionality
from reading_src import TickerDataLoader
loader = TickerDataLoader()
btc_data = loader.load_ticker_data("BTCUSDT", "2018-04-07", "2018-04-07")
print(f"Loaded {len(btc_data)} trades, price range: ${btc_data['Price'].min():.2f}-${btc_data['Price'].max():.2f}")

# Test date range and memory usage    
eth_week = loader.load_ticker_data("ETHBTC", "2018-04-07", "2018-04-14")
print(f"Week of data: {len(eth_week)} trades, memory usage: {eth_week.memory_usage().sum() / 1024**2:.1f} MB")

Diagram

%%{ init : { "theme" : "default" }}%%
graph TD
    DataDir["data/<br/>BTCUSDT/<br/>ETHBTC/<br/>etc."]:::context
    ZipFiles["*.csv.zip<br/>daily files"]:::context
    
    ReadTicker["reading-src/<br/>read_ticker.py"]:::major-edit
    InitFile["reading-src/<br/>__init__.py"]:::major-edit
    TestFile["tests/<br/>test_read_ticker.py"]:::major-edit
    
    TickerLoader["TickerDataLoader<br/>class"]:::major-edit
    UtilFuncs["list_available_tickers()<br/>quick_load()"]:::major-edit
    
    PandasDF["pandas<br/>DataFrame"]:::context
    
    DataDir --> ZipFiles
    ZipFiles --> ReadTicker
    ReadTicker --> TickerLoader
    ReadTicker --> UtilFuncs
    TickerLoader --> PandasDF
    TestFile --> ReadTicker
    InitFile --> ReadTicker
    
    subgraph Legend
        L1["Major Edit"]:::major-edit
        L2["Minor Edit"]:::minor-edit  
        L3["Context/No Edit"]:::context
    end
    
    classDef major-edit fill:#90EE90
    classDef minor-edit fill:#87CEEB
    classDef context fill:#FFFFFF

Notes

  • All 21 pytest tests pass successfully, covering unit tests, integration tests, and error handling scenarios
  • The module handles the existing data structure (CSV files in ZIP archives organized by ticker/date)
  • Date range: 2018-04-06 to 2019-11-18 with 590+ days of tick data
  • File renamed from read-ticker.py to read_ticker.py to follow Python naming conventions
  • Real data integration tests are included but limited - human testing with actual large datasets is recommended

Session Info:

  • Requested by: Eli Belash (@Nucs)
  • Devin Session: https://app.devin.ai/sessions/a3e57dcc2d7241f480a8407979403e93

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • [ ] Disable automatic comment and CI monitoring

Exclude all pycache via gitignore file

Nucs avatar Jul 19 '25 16:07 Nucs