text-generation-inference
text-generation-inference copied to clipboard
Segmentation fault when downloading models
System Info
2023-06-07T08:37:39.808440Z INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.69.0
Commit sha: 19c41824cb11ba1a3b60a2a65274d8c074383de3
Docker label: N/A
nvidia-smi:
Wed Jun 7 17:37:39 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.07 Driver Version: 515.65.07 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... Off | 00000000:8F:00.0 Off | Off |
| N/A 33C P0 84W / 400W | N/A | N/A Default |
| | | Enabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
| | | ECC| |
|==================+======================+===========+=======================|
| 0 8 0 0 | 6MiB / 9728MiB | 14 N/A | 1 0 0 0 0 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------+-----------+-----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
2023-06-07T08:37:39.809250Z INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, sharded: None, num_shard: None, quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 3000, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: None, weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: true }
2023-06-07T08:37:39.809524Z INFO text_generation_launcher: Starting download process.
2023-06-07T08:37:40.612930Z ERROR text_generation_launcher: Download process was signaled to shutdown with signal 11:
Error: DownloadError
Information
- [ ] Docker
- [X] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
Not sure if this is reproducible in every machine, but
- install the repo with CLI
- run
make install
- run
make download-bloom
- Get the following error
HF_HUB_ENABLE_HF_TRANSFER=1 text-generation-server download-weights bigscience/bloom
Segmentation fault (core dumped)
make: *** [Makefile:49: download-bloom] Error 139
Expected behavior
Even when I download bloom-560m with git-lfs, I still get this error when launching.
I know when running this with the official docker, it works well. However, I'm at an environment where running an unauthorized dockerfile is difficult (almost impossible).
This is very odd, Segfault should never happen since everything is in safe Rust, this is super odd indeed.
Is it possible it could be linked to a special partitionning, or network mounted filesystem on your end ?
I'm using a MiG system and CUDA version is 11.7 so that may be related. However, I found someone with a similar issue at #306
I tried with a slightly different system (not MiG) and encounter same issue.
2023-06-08T06:08:52.395479Z INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.69.0
Commit sha: 19c41824cb11ba1a3b60a2a65274d8c074383de3
Docker label: N/A
nvidia-smi:
Thu Jun 8 15:08:51 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.07 Driver Version: 515.65.07 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... On | 00000000:C5:00.0 Off | Off |
| N/A 31C P0 64W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM... On | 00000000:CA:00.0 Off | Off |
| N/A 32C P0 62W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A100-SXM... On | 00000000:E3:00.0 Off | Off |
| N/A 32C P0 64W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A100-SXM... On | 00000000:E7:00.0 Off | Off |
| N/A 34C P0 63W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
2023-06-08T06:08:52.395605Z INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, sharded: None, num_shard: None, quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 3000, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: None, weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: true }
2023-06-08T06:08:52.395644Z INFO text_generation_launcher: Sharding model on 4 processes
2023-06-08T06:08:52.395862Z INFO text_generation_launcher: Starting download process.
2023-06-08T06:08:53.097416Z ERROR text_generation_launcher: Download process was signaled to shutdown with signal 11:
Error: DownloadError
Here's the results of calling some test code. Is it possible that I didn't build the project properly?
make python-server-tests
HF_HUB_ENABLE_HF_TRANSFER=1 pytest -s -vv -m "not private" server/tests
============================================================================================= test session starts ==============================================================================================
platform linux -- Python 3.9.16, pytest-7.3.1, pluggy-1.0.0 -- /home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/bin/python
cachedir: .pytest_cache
rootdir: /home/nsml/dev/text-generation-inference/server
configfile: pyproject.toml
plugins: syrupy-4.0.2, asyncio-0.17.2
asyncio: mode=legacy
collecting ... Fatal Python error: Segmentation fault
Current thread 0x00007fc843873740 (most recent call first):
File "<frozen importlib._bootstrap>", line 228 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 1173 in create_module
File "<frozen importlib._bootstrap>", line 565 in module_from_spec
File "<frozen importlib._bootstrap>", line 666 in _load_unlocked
File "<frozen importlib._bootstrap>", line 986 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1007 in _find_and_load
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/torch/__init__.py", line 229 in <module>
File "<frozen importlib._bootstrap>", line 228 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 850 in exec_module
File "<frozen importlib._bootstrap>", line 680 in _load_unlocked
File "<frozen importlib._bootstrap>", line 986 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1007 in _find_and_load
File "/home/nsml/dev/text-generation-inference/server/tests/models/test_bloom.py", line 2 in <module>
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/assertion/rewrite.py", line 172 in exec_module
File "<frozen importlib._bootstrap>", line 680 in _load_unlocked
File "<frozen importlib._bootstrap>", line 986 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1007 in _find_and_load
File "<frozen importlib._bootstrap>", line 1030 in _gcd_import
File "/home/nsml/.pyenv/versions/3.9.16/lib/python3.9/importlib/__init__.py", line 127 in import_module
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/pathlib.py", line 564 in import_path
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/python.py", line 617 in _importtestmodule
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/python.py", line 528 in _getobj
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/python.py", line 310 in obj
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/python.py", line 545 in _inject_setup_module_fixture
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/python.py", line 531 in collect
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/runner.py", line 372 in <lambda>
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/runner.py", line 341 in from_call
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/runner.py", line 372 in pytest_make_collect_report
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/runner.py", line 547 in collect_one_node
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/main.py", line 832 in genitems
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/main.py", line 665 in perform_collect
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/main.py", line 333 in pytest_collection
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/main.py", line 322 in _main
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/main.py", line 269 in wrap_session
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_callers.py", line 39 in _multicall
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_manager.py", line 80 in _hookexec
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_hooks.py", line 265 in __call__
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/config/__init__.py", line 166 in main
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/config/__init__.py", line 189 in console_main
File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/bin/pytest", line 8 in <module>
Segmentation fault (core dumped)
make: *** [Makefile:35: python-server-tests] Error 139
Here there's no rust being called, everything is pure python.
The segfault is extremely weird. In our code my only suspicious thing could be the compiled kernels (which can be deactivated with --disable-custom-kernels
. Without that, I can only imagine the segfault occurs because of a bad environment setting, cuda drivers, torch or something like that.
You were right. I just found out that it was a pytorch issue.
import torch
has been causing the segfault this whole time.
The based docker image I was using from (nvcr) was an outdated one so I'm guessing it's not compatible with torch 2.0. I'm trying with the latest image to see if it works.
I'm closing this issue for now! Thanks a lot!