NLP icon indicating copy to clipboard operation
NLP copied to clipboard

feat: Set up comprehensive Python testing infrastructure

Open llbbl opened this issue 3 months ago • 0 comments

Set up comprehensive Python testing infrastructure

Summary

This PR establishes a complete testing infrastructure for the ML/NLP projects collection, providing a solid foundation for writing and running tests across all modules (chatbot, embeddings, machine translation, pos tagging, sentiment analysis, and text generation).

Changes Made

Package Management

  • Poetry Configuration: Set up pyproject.toml with Poetry as the package manager
  • Dependencies: Migrated and organized dependencies including:
    • Production: TensorFlow 2.13+, PyTorch 2.0+, NumPy, PyYAML, Requests
    • Testing: pytest, pytest-cov, pytest-mock as dev dependencies

Testing Configuration

  • pytest Setup: Comprehensive pytest configuration with:
    • Test discovery patterns for test_*.py and *_test.py
    • Coverage reporting with 80% threshold requirement
    • HTML and XML coverage reports (htmlcov/, coverage.xml)
    • Custom markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.slow

Directory Structure

tests/
├── __init__.py
├── conftest.py              # Shared fixtures
├── test_infrastructure.py   # Validation tests
├── unit/
│   └── __init__.py
└── integration/
    └── __init__.py

Testing Fixtures (conftest.py)

Comprehensive set of ML/NLP-focused fixtures:

  • File System: temp_dir, temp_checkpoint_dir, sample_text_file
  • ML Frameworks: sample_tensorflow_tensor, sample_torch_tensor, sample_numpy_array
  • Configurations: sample_config, mock_model_config, sample_yaml_config
  • Data: sample_text_data, sample_dataset_info, small_batch_data, mock_tokenizer
  • Reproducibility: reset_random_seeds (auto-applied to all tests)

Additional Improvements

  • Updated .gitignore: Added testing artifacts, Claude settings, model files, and IDE configurations
  • Validation Tests: Created test_infrastructure.py with 16 tests verifying all components work correctly

Running Tests

Basic Commands

# Run all tests
poetry run pytest

# Run with verbose output
poetry run pytest -v

# Run specific test file
poetry run pytest tests/test_infrastructure.py

# Run with coverage (default behavior)
poetry run pytest --cov

# Run only unit tests
poetry run pytest -m unit

# Run only integration tests  
poetry run pytest -m integration

# Skip slow tests
poetry run pytest -m "not slow"

Coverage Reports

  • Terminal: Coverage summary displayed after test run
  • HTML: Detailed report generated in htmlcov/index.html
  • XML: Machine-readable report in coverage.xml

Verification

The infrastructure has been validated with comprehensive tests covering:

  • ✅ Basic pytest functionality
  • ✅ All fixtures are working correctly
  • ✅ TensorFlow, PyTorch, and NumPy integration
  • ✅ Test markers and organization
  • ✅ Temporary file handling
  • ✅ Random seed consistency for reproducible tests
  • ✅ Coverage tracking setup

Next Steps

Developers can now:

  1. Write unit tests in tests/unit/ for individual functions and classes
  2. Write integration tests in tests/integration/ for module interactions
  3. Use fixtures from conftest.py for common test data and configurations
  4. Run tests with poetry run pytest to ensure code quality
  5. Monitor coverage to maintain high test coverage standards

Notes

  • All dependencies are managed through Poetry and the lock file is committed
  • Coverage threshold is set to 80% - failing tests will report coverage warnings
  • The infrastructure supports both TensorFlow and PyTorch workflows
  • Random seeds are automatically reset for each test to ensure reproducibility

llbbl avatar Sep 01 '25 15:09 llbbl