TabPFN icon indicating copy to clipboard operation
TabPFN copied to clipboard

[Long-term] Investigate and improve cross-platform prediction consistency

Open noahho opened this issue 9 months ago • 0 comments

Issue Description

When running TabPFN consistency tests across different platforms (e.g., macOS vs Linux, x86 vs ARM), we've observed significant differences in model predictions.

Current Observations:

  1. Despite using , regression predictions on diabetes dataset still show differences:

    • On macOS (ARM):
    • On Linux CI:
    • Difference: ~2.34 (about ~1.6% relative difference)
  2. Classification predictions seem more stable but still show small variations

Impact:

  • Makes it difficult to have reproducible research/benchmarks across platforms
  • Requires platform-specific consistency tests (as implemented in PR #217)
  • Could affect production deployments across different infrastructures

Potential Causes:

  • Different CPU architectures (x86 vs. ARM)
  • Different BLAS/LAPACK implementations
  • OS-specific optimizations
  • Compiler-specific floating-point optimizations

Suggested Solutions to Investigate:

  1. More aggressive precision control beyond sklearn's 16-decimal option
  2. Implementation of deterministic mode that sacrifices some performance for better consistency
  3. Platform detection with environment-specific reference values
  4. Custom normalization/scaling approaches that are more robust to platform differences

Related PR:

PR #217 worked around this by making consistency tests platform-specific, but we should investigate a more fundamental solution.

Priority:

Medium - This is not breaking functionality but affects reproducibility

noahho avatar Mar 01 '25 17:03 noahho