RecSysDatasets icon indicating copy to clipboard operation
RecSysDatasets copied to clipboard

feat: Add comprehensive Python testing infrastructure with Poetry

Open llbbl opened this issue 6 months ago • 0 comments

Add Python Testing Infrastructure

Summary

This PR sets up a comprehensive testing infrastructure for the Python project using Poetry as the package manager and pytest as the testing framework. The setup provides a ready-to-use testing environment where developers can immediately start writing tests.

Changes Made

Package Management

  • Poetry Setup: Created pyproject.toml with Poetry configuration
  • Dependency Migration: Migrated existing dependencies from requirements.txt
  • Development Dependencies: Added testing tools as dev dependencies

Testing Framework

  • pytest: Main testing framework with comprehensive configuration
  • pytest-cov: Coverage reporting with HTML and XML output
  • pytest-mock: Mocking utilities for unit tests

Configuration

  • Test Discovery: Configured patterns for finding tests (test_*.py, *_test.py)
  • Coverage Settings:
    • 80% coverage threshold
    • HTML reports in htmlcov/
    • XML reports for CI integration
    • Branch coverage enabled
  • Custom Markers: Added unit, integration, and slow test markers

Directory Structure

tests/
├── __init__.py
├── conftest.py          # Shared fixtures and configuration
├── unit/
│   └── __init__.py
├── integration/
│   └── __init__.py
└── test_setup_validation.py  # Infrastructure validation tests

Shared Fixtures (conftest.py)

  • temp_dir: Temporary directory for test files
  • sample_dataframe: Sample pandas DataFrame
  • sample_numpy_array: Sample numpy array
  • mock_config: Mock configuration dictionary
  • sample_csv_file: Creates test CSV files
  • sample_json_file: Creates test JSON files
  • mock_dataset_files: Creates mock dataset files (.inter, .user, .item)
  • reset_environment: Resets environment variables
  • capture_logs: Captures log messages during tests

Development Commands

  • poetry run test - Run all tests with coverage
  • poetry run tests - Alternative command (both work)
  • Standard pytest options are available (e.g., -v, -k, --markers)

Additional Setup

  • Created comprehensive .gitignore with Python, testing, and Claude entries
  • Added validation tests to verify the infrastructure works correctly
  • All tests pass successfully (14 validation tests)

How to Use

  1. Install dependencies:

    poetry install
    
  2. Run tests:

    poetry run test
    # or
    poetry run tests
    
  3. Run specific test types:

    poetry run pytest -m unit        # Run only unit tests
    poetry run pytest -m integration # Run only integration tests
    poetry run pytest -m "not slow"  # Skip slow tests
    
  4. View coverage report:

    • HTML report: Open htmlcov/index.html in a browser
    • Terminal report: Included in test output
    • XML report: coverage.xml for CI integration

Notes

  • The coverage threshold is set to 80% but will initially fail since no actual tests for the codebase exist yet
  • The infrastructure is ready for immediate test development
  • All pytest standard features are available
  • Poetry lock file is created and should be committed to ensure reproducible builds

Next Steps

Developers can now start writing unit and integration tests for the conversion tools modules. The infrastructure provides all necessary tools and configurations for comprehensive testing.

llbbl avatar Jun 16 '25 15:06 llbbl