intel-extension-for-pytorch icon indicating copy to clipboard operation
intel-extension-for-pytorch copied to clipboard

Segmentation fault (core dumped) when execute model.to("xpu")

Open evelinamorim opened this issue 2 years ago • 12 comments

Describe the bug

After following the instructions in the tutorial: https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/installation.html.

I executed the code (same as in the github page):

import torch
import torchvision.models as models

model = models.resnet50(pretrained=True)
model.eval()
data = torch.rand(1, 3, 224, 224)

import intel_extension_for_pytorch as ipex
model = model.to('xpu')

However, the last line produced: Segmentation fault (core dumped). It was in the lazy_init of intel_extension_for_pytorch.

Versions

PyTorch version: 1.13.0a0+git6c9b55e PyTorch CXX11 ABI: Yes IPEX version: 1.13.120+xpu IPEX commit: c2a37012e Build type: Release

OS: Ubuntu 22.04.2 LTS (x86_64) GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0 Clang version: N/A IGC version: 2023.2.0 (2023.2.0.20230622) CMake version: version 3.26.4 Libc version: glibc-2.35

Python version: 3.9.17 (main, Jun 6 2023, 20:11:21) [GCC 11.3.0] (64-bit runtime) Python platform: Linux-5.19.0-46-generic-x86_64-with-glibc2.35 Is XPU available: True DPCPP runtime version: 2023.2.0 MKL version: 2023.2.0 GPU models and configuration: [0] _DeviceProperties(name='Intel(R) UHD Graphics', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=12559MB, max_compute_units=24) Intel OpenCL ICD version: 23.17.26241.33-647~22.04 Level Zero version: 1.3.26241.33-647~22.04

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz CPU family: 6 Model: 142 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 Stepping: 12 CPU max MHz: 4900,0000 CPU min MHz: 400,0000 BogoMIPS: 4599.93 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust sgx bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities Virtualization: VT-x L1d cache: 128 KiB (4 instances) L1i cache: 128 KiB (4 instances) L2 cache: 1 MiB (4 instances) L3 cache: 8 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-7 Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Retbleed: Mitigation; Enhanced IBRS Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence Vulnerability Srbds: Mitigation; Microcode Vulnerability Tsx async abort: Not affected

Versions of relevant libraries: [pip3] intel-extension-for-pytorch==1.13.120+xpu [pip3] numpy==1.25.1 [pip3] torch==1.13.0a0+git6c9b55e [pip3] torchvision==0.14.1a0+5e8e2f1 [conda] N/A

evelinamorim avatar Jul 21 '23 14:07 evelinamorim

@evelinamorim can you try to import intel_extension_for_pytorch immediately after import torch?

@jingxu10 pls help confirm the issue.

gujinghui avatar Jul 21 '23 15:07 gujinghui

@gujinghui I have the same issue, no matter when intel_extension_for_pytorch is imported. That means, the following code results in "Segmentation fault (core dumped)":

import torch
import intel_extension_for_pytorch as ipex

import torchvision.models as models

model = models.resnet50(pretrained=True)
model.eval()
data = torch.rand(1, 3, 224, 224)

model = model.to('xpu')

gekeleda avatar Jul 23 '23 17:07 gekeleda

@gekeleda Do you have "gdb" installed in your environment. If so, you may try the following and report back what you see when the core dump happens: gdb --args `which python` your_script.py

jgong5 avatar Jul 24 '23 00:07 jgong5

@gujinghui , as @gekeleda said, the same result is produced no matter the order of the imports.

@jgong5 I executed with gdb and the following output was produced.

Starting program: /home/evelinamorim/PycharmProjects/CT-Coref-pt/venv_pycharm/bin/python test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffde9ff640 (LWP 47326)]
[New Thread 0x7fffde1fe640 (LWP 47327)]
[New Thread 0x7fffdb9fd640 (LWP 47328)]
[New Thread 0x7fffd71fc640 (LWP 47329)]
[New Thread 0x7fffd69fb640 (LWP 47330)]
[New Thread 0x7fffd21fa640 (LWP 47331)]
[New Thread 0x7fffcf9f9640 (LWP 47332)]
warning: File "/opt/intel/oneapi/compiler/2023.2.0/linux/lib/libsycl.so.6.2.0-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /opt/intel/oneapi/compiler/2023.2.0/linux/lib/libsycl.so.6.2.0-gdb.py
line to your configuration file "/home/evelinamorim/.config/gdb/gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/evelinamorim/.config/gdb/gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
[Thread 0x7fffcf9f9640 (LWP 47332) exited]
[Thread 0x7fffd21fa640 (LWP 47331) exited]
[Thread 0x7fffd69fb640 (LWP 47330) exited]
[Thread 0x7fffd71fc640 (LWP 47329) exited]
[Thread 0x7fffdb9fd640 (LWP 47328) exited]
[Thread 0x7fffde1fe640 (LWP 47327) exited]
[Thread 0x7fffde9ff640 (LWP 47326) exited]
[Detaching after fork from child process 47339]
[Detaching after fork from child process 47340]
[Detaching after fork from child process 47344]
warning: Temporarily disabling breakpoints for unloaded shared library "/opt/intel/oneapi/compiler/2023.2.0/linux/lib/x64/libintelocl_emu.so"
warning: Temporarily disabling breakpoints for unloaded shared library "/opt/intel/oneapi/compiler/2023.2.0/linux/lib/x64/libintelocl.so"
[New Thread 0x7fffcf9f9640 (LWP 47345)]
warning: Temporarily disabling breakpoints for unloaded shared library "/opt/intel//oneapi/compiler/latest/linux/lib/x64/libintelocl.so"
warning: Temporarily disabling breakpoints for unloaded shared library "/opt/intel//oneapi/compiler/latest/linux/lib/x64/libintelocl_emu.so"
[New Thread 0x7fffd21fa640 (LWP 47348)]
[New Thread 0x7fffd69fb640 (LWP 47349)]
[Thread 0x7fffd69fb640 (LWP 47349) exited]
/home/evelinamorim/PycharmProjects/CT-Coref-pt/venv_pycharm/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/evelinamorim/PycharmProjects/CT-Coref-pt/venv_pycharm/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
[New Thread 0x7fffd71fc640 (LWP 47350)]
[New Thread 0x7ffeeb1de640 (LWP 47351)]
[New Thread 0x7ffeea9dd640 (LWP 47352)]

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()



evelinamorim avatar Jul 24 '23 13:07 evelinamorim

What do you see from the backtrace after typing "bt"?

jgong5 avatar Jul 24 '23 13:07 jgong5

Thanks for the quick reply!

The backtrace is the following:

https://gist.github.com/evelinamorim/40dce656614394bad491955b7bc274a9#file-error_intel_pytorch_extension

evelinamorim avatar Jul 24 '23 14:07 evelinamorim

@evelinamorim

From the version information you provided, you are using oneAPI 2023.2 with IPEX 1.13 release?

IGC version: 2023.2.0 (2023.2.0.20230622)
DPCPP runtime version: 2023.2.0
MKL version: 2023.2.0

Can you try with Intel® oneAPI Base Toolkit 2023.1, as mentioned in our release note? https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/installation.html#software-requirements

gujinghui avatar Jul 24 '23 15:07 gujinghui

@gujinghui Thanks again for the quick reply! The change of version of oneAPI Base Toolkit worked! Thanks!

evelinamorim avatar Jul 25 '23 09:07 evelinamorim

@gujinghui @evelinamorim , Could you guys please elaborate how to solve this issue.. I am running a docker env (building from a Docker file ) which all are preinstalled. But facing exactly the same issue "segmentation fault : Core dumped ". "ipex-llm:2.1.10"

GunturuSandeep avatar Jan 23 '24 03:01 GunturuSandeep

Hey @gujinghui, I want to use IPEX 1.13 but it appears that the oneAPI Base Toolkit 2023.1 is no longer available for download. Is there something we can do if we want to use IPEX with pytorch 1.13? Is IPEX 1.13 essentially deprecated?

pujaltes avatar Apr 25 '24 10:04 pujaltes

The old version of oneAPI toolkit is obsoleted. I don't think we have any copy of these old versions.

IPEX 1.13 is coupled with old oneAPI toolkit. Therefore, IPEX 1.13 is deprecated, as well. Sorry about that.

May I know why you have to work on IPEX 1.13?

gujinghui avatar Apr 25 '24 13:04 gujinghui

Thank you for your prompt response and confirmation that IPEX 1.13 has been deprecated. We were hoping to avoid having to upgrade some of our models to Pytorch 2.

pujaltes avatar Apr 25 '24 18:04 pujaltes

Seems like no actions are needed. close at this time. Feel free to reopen if you have further requests.

jingxu10 avatar Jun 13 '24 03:06 jingxu10