EHRSQL
EHRSQL copied to clipboard
feat: Set up comprehensive Python testing infrastructure with Poetry
Set up Python Testing Infrastructure with Poetry
Summary
This PR establishes a complete testing infrastructure for the EHRSQL project using Poetry as the package manager. The setup provides a ready-to-use testing environment where developers can immediately start writing and running tests.
Changes Made
Package Management
- Poetry Configuration: Set up
pyproject.tomlwith Poetry as the package manager - Dependencies: Added core dependencies (torch, transformers, numpy, pyyaml, tqdm, func-timeout)
- Dev Dependencies: Added pytest, pytest-cov, pytest-mock for comprehensive testing
Testing Configuration
- pytest Configuration: Configured in
pyproject.tomlwith:- 80% coverage threshold requirement
- HTML and XML coverage reporting
- Custom markers:
unit,integration,slow - Strict options and verbose output
- Coverage Settings: Configured to track T5/, utils/, and preprocess/ directories
- Test Discovery: Configured for tests/ directory with proper patterns
Directory Structure
tests/
├── __init__.py
├── conftest.py # Shared fixtures
├── unit/
│ └── __init__.py
├── integration/
│ └── __init__.py
└── test_infrastructure_validation.py # Validation tests
Shared Fixtures (conftest.py)
- File Management:
temp_dir,temp_filefixtures - Mock Objects:
mock_config,mock_tokenizer,mock_modelfixtures - Sample Data:
sample_dataset_item,sample_json_data,sample_tables_jsonfixtures - Database Testing:
mock_sqlite_dbfixture with sample EHR data - Configuration:
yaml_config_filefixture for config testing - Environment:
set_random_seeds,setup_test_environmentfixtures
Development Tools
- Scripts:
poetry run testandpoetry run testscommands - Git Integration: Comprehensive
.gitignorewith testing, ML, and development entries - Coverage Reports: HTML reports in
htmlcov/, XML incoverage.xml
Validation
- Infrastructure Tests: Created validation tests verifying all fixtures work properly
- Marker Testing: Verified custom markers (
@pytest.mark.unit, etc.) function correctly - Import Testing: Confirmed project modules are importable in test environment
Running Tests
Basic Usage
# Run all tests with coverage
poetry run test
# Run specific test types
poetry run pytest -m unit # Unit tests only
poetry run pytest -m integration # Integration tests only
poetry run pytest -m slow # Slow tests only
# Run with verbose output
poetry run pytest -v
# Run specific test file
poetry run pytest tests/test_infrastructure_validation.py
Coverage Reports
- Terminal: Coverage summary displayed after test run
- HTML Report: Open
htmlcov/index.htmlin browser - XML Report: Machine-readable coverage in
coverage.xml
Development Workflow
- Install Dependencies:
poetry install - Write Tests: Add test files in appropriate
tests/subdirectories - Run Tests:
poetry run test - Check Coverage: Review reports to ensure 80% threshold is met
- Use Fixtures: Leverage shared fixtures from
conftest.py
Dependencies Added
- pytest: Main testing framework
- pytest-cov: Coverage reporting with configurable thresholds
- pytest-mock: Enhanced mocking utilities for tests
Notes
- Poetry lock file excluded from
.gitignoreas requested - All infrastructure validated with passing tests
- 80% coverage threshold enforced (currently low as no actual tests written yet)
- Ready for immediate development - no additional setup required
Next Steps
Developers can now:
- Start writing unit tests for individual functions/classes
- Create integration tests for end-to-end workflows
- Use provided fixtures to mock external dependencies
- Run tests with confidence using the established infrastructure