EHRSQL icon indicating copy to clipboard operation
EHRSQL copied to clipboard

feat: Set up comprehensive Python testing infrastructure with Poetry

Open llbbl opened this issue 3 months ago • 0 comments

Set up Python Testing Infrastructure with Poetry

Summary

This PR establishes a complete testing infrastructure for the EHRSQL project using Poetry as the package manager. The setup provides a ready-to-use testing environment where developers can immediately start writing and running tests.

Changes Made

Package Management

  • Poetry Configuration: Set up pyproject.toml with Poetry as the package manager
  • Dependencies: Added core dependencies (torch, transformers, numpy, pyyaml, tqdm, func-timeout)
  • Dev Dependencies: Added pytest, pytest-cov, pytest-mock for comprehensive testing

Testing Configuration

  • pytest Configuration: Configured in pyproject.toml with:
    • 80% coverage threshold requirement
    • HTML and XML coverage reporting
    • Custom markers: unit, integration, slow
    • Strict options and verbose output
  • Coverage Settings: Configured to track T5/, utils/, and preprocess/ directories
  • Test Discovery: Configured for tests/ directory with proper patterns

Directory Structure

tests/
├── __init__.py
├── conftest.py                 # Shared fixtures
├── unit/
│   └── __init__.py
├── integration/
│   └── __init__.py
└── test_infrastructure_validation.py  # Validation tests

Shared Fixtures (conftest.py)

  • File Management: temp_dir, temp_file fixtures
  • Mock Objects: mock_config, mock_tokenizer, mock_model fixtures
  • Sample Data: sample_dataset_item, sample_json_data, sample_tables_json fixtures
  • Database Testing: mock_sqlite_db fixture with sample EHR data
  • Configuration: yaml_config_file fixture for config testing
  • Environment: set_random_seeds, setup_test_environment fixtures

Development Tools

  • Scripts: poetry run test and poetry run tests commands
  • Git Integration: Comprehensive .gitignore with testing, ML, and development entries
  • Coverage Reports: HTML reports in htmlcov/, XML in coverage.xml

Validation

  • Infrastructure Tests: Created validation tests verifying all fixtures work properly
  • Marker Testing: Verified custom markers (@pytest.mark.unit, etc.) function correctly
  • Import Testing: Confirmed project modules are importable in test environment

Running Tests

Basic Usage

# Run all tests with coverage
poetry run test

# Run specific test types
poetry run pytest -m unit          # Unit tests only
poetry run pytest -m integration   # Integration tests only
poetry run pytest -m slow          # Slow tests only

# Run with verbose output
poetry run pytest -v

# Run specific test file
poetry run pytest tests/test_infrastructure_validation.py

Coverage Reports

  • Terminal: Coverage summary displayed after test run
  • HTML Report: Open htmlcov/index.html in browser
  • XML Report: Machine-readable coverage in coverage.xml

Development Workflow

  1. Install Dependencies: poetry install
  2. Write Tests: Add test files in appropriate tests/ subdirectories
  3. Run Tests: poetry run test
  4. Check Coverage: Review reports to ensure 80% threshold is met
  5. Use Fixtures: Leverage shared fixtures from conftest.py

Dependencies Added

  • pytest: Main testing framework
  • pytest-cov: Coverage reporting with configurable thresholds
  • pytest-mock: Enhanced mocking utilities for tests

Notes

  • Poetry lock file excluded from .gitignore as requested
  • All infrastructure validated with passing tests
  • 80% coverage threshold enforced (currently low as no actual tests written yet)
  • Ready for immediate development - no additional setup required

Next Steps

Developers can now:

  1. Start writing unit tests for individual functions/classes
  2. Create integration tests for end-to-end workflows
  3. Use provided fixtures to mock external dependencies
  4. Run tests with confidence using the established infrastructure

llbbl avatar Sep 03 '25 22:09 llbbl