text-generation-inference Segmentation fault when downloading models

System Info

2023-06-07T08:37:39.808440Z  INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.69.0
Commit sha: 19c41824cb11ba1a3b60a2a65274d8c074383de3
Docker label: N/A
nvidia-smi:
Wed Jun  7 17:37:39 2023
   +-----------------------------------------------------------------------------+
   | NVIDIA-SMI 515.65.07    Driver Version: 515.65.07    CUDA Version: 11.7     |
   |-------------------------------+----------------------+----------------------+
   | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
   |                               |                      |               MIG M. |
   |===============================+======================+======================|
   |   0  NVIDIA A100-SXM...  Off  | 00000000:8F:00.0 Off |                  Off |
   | N/A   33C    P0    84W / 400W |                  N/A |     N/A      Default |
   |                               |                      |              Enabled |
   +-------------------------------+----------------------+----------------------+

   +-----------------------------------------------------------------------------+
   | MIG devices:                                                                |
   +------------------+----------------------+-----------+-----------------------+
   | GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
   |      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
   |                  |                      |        ECC|                       |
   |==================+======================+===========+=======================|
   |  0    8   0   0  |      6MiB /  9728MiB | 14    N/A |  1   0    0    0    0 |
   |                  |      0MiB / 16383MiB |           |                       |
   +------------------+----------------------+-----------+-----------------------+

   +-----------------------------------------------------------------------------+
   | Processes:                                                                  |
   |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
   |        ID   ID                                                   Usage      |
   |=============================================================================|
   |  No running processes found                                                 |
   +-----------------------------------------------------------------------------+
2023-06-07T08:37:39.809250Z  INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, sharded: None, num_shard: None, quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 3000, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: None, weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: true }
2023-06-07T08:37:39.809524Z  INFO text_generation_launcher: Starting download process.
2023-06-07T08:37:40.612930Z ERROR text_generation_launcher: Download process was signaled to shutdown with signal 11:
Error: DownloadError

Information

[ ] Docker
[X] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

Not sure if this is reproducible in every machine, but

install the repo with CLI
run make install
run make download-bloom
Get the following error

HF_HUB_ENABLE_HF_TRANSFER=1 text-generation-server download-weights bigscience/bloom
Segmentation fault (core dumped)
make: *** [Makefile:49: download-bloom] Error 139

Expected behavior

Even when I download bloom-560m with git-lfs, I still get this error when launching.

Jun 07 '23 08:06 jshin49

I know when running this with the official docker, it works well. However, I'm at an environment where running an unauthorized dockerfile is difficult (almost impossible).

Jun 07 '23 08:06 jshin49

This is very odd, Segfault should never happen since everything is in safe Rust, this is super odd indeed.

Is it possible it could be linked to a special partitionning, or network mounted filesystem on your end ?

Jun 07 '23 09:06 Narsil

I'm using a MiG system and CUDA version is 11.7 so that may be related. However, I found someone with a similar issue at #306

Jun 07 '23 09:06 jshin49

I tried with a slightly different system (not MiG) and encounter same issue.

2023-06-08T06:08:52.395479Z  INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.69.0
Commit sha: 19c41824cb11ba1a3b60a2a65274d8c074383de3
Docker label: N/A
nvidia-smi:
Thu Jun  8 15:08:51 2023
   +-----------------------------------------------------------------------------+
   | NVIDIA-SMI 515.65.07    Driver Version: 515.65.07    CUDA Version: 11.7     |
   |-------------------------------+----------------------+----------------------+
   | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
   |                               |                      |               MIG M. |
   |===============================+======================+======================|
   |   0  NVIDIA A100-SXM...  On   | 00000000:C5:00.0 Off |                  Off |
   | N/A   31C    P0    64W / 400W |      0MiB / 81920MiB |      0%      Default |
   |                               |                      |             Disabled |
   +-------------------------------+----------------------+----------------------+
   |   1  NVIDIA A100-SXM...  On   | 00000000:CA:00.0 Off |                  Off |
   | N/A   32C    P0    62W / 400W |      0MiB / 81920MiB |      0%      Default |
   |                               |                      |             Disabled |
   +-------------------------------+----------------------+----------------------+
   |   2  NVIDIA A100-SXM...  On   | 00000000:E3:00.0 Off |                  Off |
   | N/A   32C    P0    64W / 400W |      0MiB / 81920MiB |      0%      Default |
   |                               |                      |             Disabled |
   +-------------------------------+----------------------+----------------------+
   |   3  NVIDIA A100-SXM...  On   | 00000000:E7:00.0 Off |                  Off |
   | N/A   34C    P0    63W / 400W |      0MiB / 81920MiB |      0%      Default |
   |                               |                      |             Disabled |
   +-------------------------------+----------------------+----------------------+

   +-----------------------------------------------------------------------------+
   | Processes:                                                                  |
   |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
   |        ID   ID                                                   Usage      |
   |=============================================================================|
   |  No running processes found                                                 |
   +-----------------------------------------------------------------------------+
2023-06-08T06:08:52.395605Z  INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, sharded: None, num_shard: None, quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 3000, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: None, weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: true }
2023-06-08T06:08:52.395644Z  INFO text_generation_launcher: Sharding model on 4 processes
2023-06-08T06:08:52.395862Z  INFO text_generation_launcher: Starting download process.
2023-06-08T06:08:53.097416Z ERROR text_generation_launcher: Download process was signaled to shutdown with signal 11:
Error: DownloadError

Jun 08 '23 06:06 jshin49

Here's the results of calling some test code. Is it possible that I didn't build the project properly?

make python-server-tests

HF_HUB_ENABLE_HF_TRANSFER=1 pytest -s -vv -m "not private" server/tests
============================================================================================= test session starts ==============================================================================================
platform linux -- Python 3.9.16, pytest-7.3.1, pluggy-1.0.0 -- /home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/bin/python
cachedir: .pytest_cache
rootdir: /home/nsml/dev/text-generation-inference/server
configfile: pyproject.toml
plugins: syrupy-4.0.2, asyncio-0.17.2
asyncio: mode=legacy
collecting ... Fatal Python error: Segmentation fault

Current thread 0x00007fc843873740 (most recent call first):
  File "<frozen importlib._bootstrap>", line 228 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1173 in create_module
  File "<frozen importlib._bootstrap>", line 565 in module_from_spec
  File "<frozen importlib._bootstrap>", line 666 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 986 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1007 in _find_and_load
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/torch/__init__.py", line 229 in <module>
  File "<frozen importlib._bootstrap>", line 228 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 850 in exec_module
  File "<frozen importlib._bootstrap>", line 680 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 986 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1007 in _find_and_load
  File "/home/nsml/dev/text-generation-inference/server/tests/models/test_bloom.py", line 2 in <module>
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/assertion/rewrite.py", line 172 in exec_module
  File "<frozen importlib._bootstrap>", line 680 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 986 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1007 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1030 in _gcd_import
  File "/home/nsml/.pyenv/versions/3.9.16/lib/python3.9/importlib/__init__.py", line 127 in import_module
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/pathlib.py", line 564 in import_path
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/python.py", line 617 in _importtestmodule
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/python.py", line 528 in _getobj
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/python.py", line 310 in obj
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/python.py", line 545 in _inject_setup_module_fixture
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/python.py", line 531 in collect
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/runner.py", line 372 in <lambda>
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/runner.py", line 341 in from_call
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/runner.py", line 372 in pytest_make_collect_report
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/runner.py", line 547 in collect_one_node
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/main.py", line 832 in genitems
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/main.py", line 665 in perform_collect
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/main.py", line 333 in pytest_collection
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/main.py", line 322 in _main
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/main.py", line 269 in wrap_session
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/config/__init__.py", line 166 in main
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/lib/python3.9/site-packages/_pytest/config/__init__.py", line 189 in console_main
  File "/home/nsml/.local/share/virtualenvs/text-generation-inference-IHibWVZC/bin/pytest", line 8 in <module>
Segmentation fault (core dumped)
make: *** [Makefile:35: python-server-tests] Error 139

Jun 08 '23 07:06 jshin49

Here there's no rust being called, everything is pure python.

The segfault is extremely weird. In our code my only suspicious thing could be the compiled kernels (which can be deactivated with --disable-custom-kernels. Without that, I can only imagine the segfault occurs because of a bad environment setting, cuda drivers, torch or something like that.

Jun 08 '23 07:06 Narsil

You were right. I just found out that it was a pytorch issue. import torch has been causing the segfault this whole time.

The based docker image I was using from (nvcr) was an outdated one so I'm guessing it's not compatible with torch 2.0. I'm trying with the latest image to see if it works.

I'm closing this issue for now! Thanks a lot!

Jun 08 '23 08:06 jshin49

text-generation-inference text-generation-inference copied to clipboard

Segmentation fault when downloading models

System Info

Information

Tasks

Reproduction

Expected behavior

text-generation-inference
text-generation-inference copied to clipboard