TabPFN icon indicating copy to clipboard operation
TabPFN copied to clipboard

segfault - memory violation

Open AntoninDataiku opened this issue 6 months ago • 4 comments

Describe the bug

If your code crashes due to a segmentation fault, try adding this line of code in the python file where the segfault seems to be triggered:

os.environ["OMP_NUM_THREADS"] = "1"

This disables operation parallelization, which can help avoid issues related to multi-threading.

Steps/Code to Reproduce

Cannot unveil the code of the company I am working in but I used TabPFN similarly as the example above.

Expected Results

Inference working and no segfault.

Actual Results

The Python process failed (exit code: 139) and I had no additional informations in the logs about the cause of the crash even by adding this line of code: os.environ["PYTHONFAULTHANDLER"] = "1" (which is supposed to provide intels about the cause of the segmentation fault).

Versions

Collecting system and dependency information...
PyTorch version: 2.7.0
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 15.5 (arm64)
GCC version: Could not collect
Clang version: 17.0.0 (clang-1700.0.13.5)
CMake version: Could not collect
Libc version: N/A

Python version: 3.9.6 (default, Apr 30 2025, 02:07:17)  [Clang 17.0.0 (clang-1700.0.13.5)] (64-bit runtime)
Python platform: macOS-15.5-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M4 Pro

Dependency Versions:
--------------------
tabpfn: 2.0.6
torch: 2.7.0
numpy: 1.26.4
scipy: 1.13.1
pandas: 2.2.3
scikit-learn: 1.6.1
typing_extensions: 4.13.2
einops: 0.8.1
huggingface-hub: 0.32.2

AntoninDataiku avatar Jun 17 '25 14:06 AntoninDataiku

Thanks for reporting! Would you mind trying the workaround suggested in https://github.com/PriorLabs/TabPFN/issues/404 of calling torch.cuda.empty_cache() before each predict call and reporting back if this fixes things?

priorphil avatar Aug 18 '25 11:08 priorphil

Thanks for reporting! Would you mind trying the workaround suggested in #404 of calling torch.cuda.empty_cache() before each predict call and reporting back if this fixes things?

Since I was running the model on CPU, the method torch.cuda.empty_cache() would not be useful here.

AntoninDataiku avatar Aug 18 '25 12:08 AntoninDataiku

Ah, sorry, I missed that :) Could you share which dtype you were using?

priorphil avatar Aug 18 '25 12:08 priorphil

IIRC, float32

AntoninDataiku avatar Aug 18 '25 12:08 AntoninDataiku