NLP
NLP copied to clipboard
feat: Set up comprehensive Python testing infrastructure
Set up comprehensive Python testing infrastructure
Summary
This PR establishes a complete testing infrastructure for the ML/NLP projects collection, providing a solid foundation for writing and running tests across all modules (chatbot, embeddings, machine translation, pos tagging, sentiment analysis, and text generation).
Changes Made
Package Management
- Poetry Configuration: Set up
pyproject.tomlwith Poetry as the package manager - Dependencies: Migrated and organized dependencies including:
- Production: TensorFlow 2.13+, PyTorch 2.0+, NumPy, PyYAML, Requests
- Testing: pytest, pytest-cov, pytest-mock as dev dependencies
Testing Configuration
- pytest Setup: Comprehensive pytest configuration with:
- Test discovery patterns for
test_*.pyand*_test.py - Coverage reporting with 80% threshold requirement
- HTML and XML coverage reports (
htmlcov/,coverage.xml) - Custom markers:
@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.slow
- Test discovery patterns for
Directory Structure
tests/
├── __init__.py
├── conftest.py # Shared fixtures
├── test_infrastructure.py # Validation tests
├── unit/
│ └── __init__.py
└── integration/
└── __init__.py
Testing Fixtures (conftest.py)
Comprehensive set of ML/NLP-focused fixtures:
- File System:
temp_dir,temp_checkpoint_dir,sample_text_file - ML Frameworks:
sample_tensorflow_tensor,sample_torch_tensor,sample_numpy_array - Configurations:
sample_config,mock_model_config,sample_yaml_config - Data:
sample_text_data,sample_dataset_info,small_batch_data,mock_tokenizer - Reproducibility:
reset_random_seeds(auto-applied to all tests)
Additional Improvements
- Updated .gitignore: Added testing artifacts, Claude settings, model files, and IDE configurations
- Validation Tests: Created
test_infrastructure.pywith 16 tests verifying all components work correctly
Running Tests
Basic Commands
# Run all tests
poetry run pytest
# Run with verbose output
poetry run pytest -v
# Run specific test file
poetry run pytest tests/test_infrastructure.py
# Run with coverage (default behavior)
poetry run pytest --cov
# Run only unit tests
poetry run pytest -m unit
# Run only integration tests
poetry run pytest -m integration
# Skip slow tests
poetry run pytest -m "not slow"
Coverage Reports
- Terminal: Coverage summary displayed after test run
- HTML: Detailed report generated in
htmlcov/index.html - XML: Machine-readable report in
coverage.xml
Verification
The infrastructure has been validated with comprehensive tests covering:
- ✅ Basic pytest functionality
- ✅ All fixtures are working correctly
- ✅ TensorFlow, PyTorch, and NumPy integration
- ✅ Test markers and organization
- ✅ Temporary file handling
- ✅ Random seed consistency for reproducible tests
- ✅ Coverage tracking setup
Next Steps
Developers can now:
- Write unit tests in
tests/unit/for individual functions and classes - Write integration tests in
tests/integration/for module interactions - Use fixtures from
conftest.pyfor common test data and configurations - Run tests with
poetry run pytestto ensure code quality - Monitor coverage to maintain high test coverage standards
Notes
- All dependencies are managed through Poetry and the lock file is committed
- Coverage threshold is set to 80% - failing tests will report coverage warnings
- The infrastructure supports both TensorFlow and PyTorch workflows
- Random seeds are automatically reset for each test to ensure reproducibility