feat: Set up comprehensive Python testing infrastructure

Open llbbl opened this issue 3 months ago • 0 comments

Set up comprehensive Python testing infrastructure

Summary

This PR establishes a complete testing infrastructure for the ML/NLP projects collection, providing a solid foundation for writing and running tests across all modules (chatbot, embeddings, machine translation, pos tagging, sentiment analysis, and text generation).

Changes Made

Package Management

Poetry Configuration: Set up pyproject.toml with Poetry as the package manager
Dependencies: Migrated and organized dependencies including:
- Production: TensorFlow 2.13+, PyTorch 2.0+, NumPy, PyYAML, Requests
- Testing: pytest, pytest-cov, pytest-mock as dev dependencies

Testing Configuration

pytest Setup: Comprehensive pytest configuration with:
- Test discovery patterns for test_*.py and *_test.py
- Coverage reporting with 80% threshold requirement
- HTML and XML coverage reports (htmlcov/, coverage.xml)
- Custom markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.slow

Directory Structure

tests/
├── __init__.py
├── conftest.py              # Shared fixtures
├── test_infrastructure.py   # Validation tests
├── unit/
│   └── __init__.py
└── integration/
    └── __init__.py

Testing Fixtures (conftest.py)

Comprehensive set of ML/NLP-focused fixtures:

File System: temp_dir, temp_checkpoint_dir, sample_text_file
ML Frameworks: sample_tensorflow_tensor, sample_torch_tensor, sample_numpy_array
Configurations: sample_config, mock_model_config, sample_yaml_config
Data: sample_text_data, sample_dataset_info, small_batch_data, mock_tokenizer
Reproducibility: reset_random_seeds (auto-applied to all tests)

Additional Improvements

Updated .gitignore: Added testing artifacts, Claude settings, model files, and IDE configurations
Validation Tests: Created test_infrastructure.py with 16 tests verifying all components work correctly

Running Tests

Basic Commands

# Run all tests
poetry run pytest

# Run with verbose output
poetry run pytest -v

# Run specific test file
poetry run pytest tests/test_infrastructure.py

# Run with coverage (default behavior)
poetry run pytest --cov

# Run only unit tests
poetry run pytest -m unit

# Run only integration tests  
poetry run pytest -m integration

# Skip slow tests
poetry run pytest -m "not slow"

Coverage Reports

Terminal: Coverage summary displayed after test run
HTML: Detailed report generated in htmlcov/index.html
XML: Machine-readable report in coverage.xml

Verification

The infrastructure has been validated with comprehensive tests covering:

✅ Basic pytest functionality
✅ All fixtures are working correctly
✅ TensorFlow, PyTorch, and NumPy integration
✅ Test markers and organization
✅ Temporary file handling
✅ Random seed consistency for reproducible tests
✅ Coverage tracking setup

Next Steps

Developers can now:

Write unit tests in tests/unit/ for individual functions and classes
Write integration tests in tests/integration/ for module interactions
Use fixtures from conftest.py for common test data and configurations
Run tests with poetry run pytest to ensure code quality
Monitor coverage to maintain high test coverage standards

Notes

All dependencies are managed through Poetry and the lock file is committed
Coverage threshold is set to 80% - failing tests will report coverage warnings
The infrastructure supports both TensorFlow and PyTorch workflows
Random seeds are automatically reset for each test to ensure reproducibility

Sep 01 '25 15:09 llbbl