text-generation-inference
text-generation-inference copied to clipboard
ERROR shard-manager When run bigcode/starcoder
System Info
docker exec -it text-generation-inference text-generation-launcher --env
(base) ➜ huggingface-text-generation-inference docker exec -it 401ba897d58aa498e6fffa0e717144c47fea4cf56c0578fbb4b384b42bcf6040 text-generation-launcher --env
2023-06-03T03:36:08.324157Z INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.69.0
Commit sha: e7248fe90e27c7c8e39dd4cac5874eb9f96ab182
Docker label: sha-e7248fe
nvidia-smi:
Sat Jun 3 03:36:08 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 0% 37C P8 13W / 310W | 693MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
2023-06-03T03:36:08.324179Z INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, sharded: None, num_shard: None, quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: true }
(base) ➜ huggingface-text-generation-inference curl 127.0.0.1:8080/info | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 455 100 455 0 0 444k 0 --:--:-- --:--:-- --:--:-- 444k
{
"model_id": "/data/bigcode/starcoder",
"model_sha": null,
"model_dtype": "torch.float32",
"model_device_type": "cpu",
"model_pipeline_tag": null,
"max_concurrent_requests": 128,
"max_best_of": 2,
"max_stop_sequences": 4,
"max_input_length": 1000,
"max_total_tokens": 1512,
"waiting_served_ratio": 1.2,
"max_batch_total_tokens": 32000,
"max_waiting_tokens": 20,
"validation_workers": 2,
"version": "0.8.2",
"sha": "e7248fe90e27c7c8e39dd4cac5874eb9f96ab182",
"docker_label": "sha-e7248fe"
}
Information
- [x] Docker
- [ ] The CLI directly
Tasks
- [x] An officially supported command
- [ ] My own modifications
Reproduction
1 ubuntu20.04
2 Start text-generation-interface
with Docker
model=/data/bigcode/starcoder
num_shard=1
volume=$PWD/data
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8.2 --model-id $model --num-shard $num_shard --disable-custom-kernels
3 Request with VSCODE extension
4 I get the followeing errors:
➜ huggingface-text-generation-inference docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8.2 --model-id $model --num-shard $num_shard --disable-custom-kernels
2023-06-03T03:33:15.272607Z INFO text_generation_launcher: Args { model_id: "/data/bigcode/starcoder", revision: None, sharded: None, num_shard: Some(1), quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: true, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: false }
2023-06-03T03:33:15.272886Z INFO text_generation_launcher: Starting download process.
2023-06-03T03:33:16.389565Z INFO download: text_generation_launcher: Files are already present on the host. Skipping download.
2023-06-03T03:33:16.775719Z INFO text_generation_launcher: Successfully downloaded weights.
2023-06-03T03:33:16.776087Z INFO text_generation_launcher: Starting shard 0
2023-06-03T03:33:26.786743Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-03T03:33:36.797049Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-03T03:33:46.807792Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-03T03:33:56.818618Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-03T03:34:06.830109Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-03T03:34:16.839934Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-03T03:34:26.850552Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-03T03:34:36.861382Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-03T03:34:46.873280Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-03T03:34:56.885746Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-03T03:35:06.896503Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-03T03:35:12.065627Z INFO shard-manager: text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
rank=0
2023-06-03T03:35:12.103705Z INFO text_generation_launcher: Shard 0 ready in 115.326268544s
2023-06-03T03:35:12.191281Z INFO text_generation_launcher: Starting Webserver
2023-06-03T03:35:12.271308Z WARN text_generation_router: router/src/main.rs:158: no pipeline tag found for model /data/bigcode/starcoder
2023-06-03T03:35:12.276164Z INFO text_generation_router: router/src/main.rs:178: Connected
2023-06-03T03:43:43.852322Z ERROR shard-manager: text_generation_launcher: Method Prefill encountered an error.
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 67, in serve
server.serve(model_id, revision, sharded, quantize, trust_remote_code, uds_path)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 155, in serve
asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code))
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.9/site-packages/grpc_interceptor/server.py", line 159, in invoke_intercept_method
return await self.intercept(
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/interceptor.py", line 20, in intercept
return await response
File "/opt/conda/lib/python3.9/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
raise error
File "/opt/conda/lib/python3.9/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 61, in Prefill
generations, next_batch = self.model.generate_token(batch)
File "/opt/conda/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/causal_lm.py", line 575, in generate_token
next_token_id, logprobs = next_token_chooser(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/tokens.py", line 71, in __call__
scores, next_logprob = self.static_warper(scores)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/logits_process.py", line 47, in __call__
self.cuda_graph = torch.cuda.CUDAGraph()
RuntimeError: CUDA error: forward compatibility was attempted on non supported HW
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handlers.
rank=0
2023-06-03T03:43:43.852597Z ERROR batch{batch_size=1}:prefill:prefill{id=0 size=1}:prefill{id=0 size=1}: text_generation_client: router/client/src/lib.rs:33: Server error: CUDA error: forward compatibility was attempted on non supported HW
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handlers.
2023-06-03T03:43:43.853127Z ERROR HTTP request{otel.name=POST /generate http.client_ip= http.flavor=1.1 http.host=192.168.1.9:8080 http.method=POST http.route=/generate http.scheme=HTTP http.target=/generate http.user_agent=node-fetch otel.kind=server trace_id=92dbf3a1bfd4c5408c7350b41e793129}:generate{parameters=GenerateParameters { best_of: None, temperature: Some(0.2), repetition_penalty: None, top_k: None, top_p: Some(0.95), typical_p: None, do_sample: true, max_new_tokens: 60, return_full_text: None, stop: ["<|endoftext|>"], truncate: None, watermark: false, details: false, seed: None }}:generate{request=GenerateRequest { inputs: "<?php\n\necho \"hello world\";\n", parameters: GenerateParameters { best_of: None, temperature: Some(0.2), repetition_penalty: None, top_k: None, top_p: Some(0.95), typical_p: None, do_sample: true, max_new_tokens: 60, return_full_text: None, stop: ["<|endoftext|>"], truncate: None, watermark: false, details: false, seed: None } }}:generate_stream{request=GenerateRequest { inputs: "<?php\n\necho \"hello world\";\n", parameters: GenerateParameters { best_of: None, temperature: Some(0.2), repetition_penalty: None, top_k: None, top_p: Some(0.95), typical_p: None, do_sample: true, max_new_tokens: 60, return_full_text: None, stop: ["<|endoftext|>"], truncate: None, watermark: false, details: false, seed: None } }}:infer:send_error: text_generation_router::infer: router/src/infer.rs:533: Request failed during generation: Server error: CUDA error: forward compatibility was attempted on non supported HW
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handlers.
Expected behavior
Expected to get no error
The model loaded on cpu for some reason. "model_device_type": "cpu", in the info.
Can you run docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8.2 --model-id $model --num-shard $num_shard --env
directly?
The model loaded on cpu for some reason. "model_device_type": "cpu", in the info.
Can you run
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8.2 --model-id $model --num-shard $num_shard --env
directly?
out
➜ huggingface-text-generation-inference docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8.2 --model-id $model --num-shard $num_shard --env
2023-06-06T04:49:23.305612Z INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.69.0
Commit sha: e7248fe90e27c7c8e39dd4cac5874eb9f96ab182
Docker label: sha-e7248fe
nvidia-smi:
Tue Jun 6 04:49:23 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 0% 35C P8 12W / 310W | 693MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
2023-06-06T04:49:23.305628Z INFO text_generation_launcher: Args { model_id: "/data/bigcode/starcoder", revision: None, sharded: None, num_shard: Some(1), quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: true }
2023-06-06T04:49:23.305682Z INFO text_generation_launcher: Starting download process.
2023-06-06T04:49:24.863179Z INFO download: text_generation_launcher: Files are already present on the host. Skipping download.
2023-06-06T04:49:25.208559Z INFO text_generation_launcher: Successfully downloaded weights.
2023-06-06T04:49:25.208691Z INFO text_generation_launcher: Starting shard 0
2023-06-06T04:49:35.221147Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-06T04:49:45.232376Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-06T04:49:55.242637Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-06T04:50:05.252233Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-06T04:50:15.263142Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-06T04:50:25.274029Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-06T04:50:35.285107Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-06T04:50:45.296919Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-06T04:50:55.310651Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-06T04:51:05.327063Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-06T04:51:15.338597Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-06T04:51:17.484955Z INFO shard-manager: text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
rank=0
2023-06-06T04:51:17.541604Z INFO text_generation_launcher: Shard 0 ready in 112.33250655s
2023-06-06T04:51:17.620150Z INFO text_generation_launcher: Starting Webserver
2023-06-06T04:51:17.695192Z WARN text_generation_router: router/src/main.rs:158: no pipeline tag found for model /data/bigcode/starcoder
2023-06-06T04:51:17.701264Z INFO text_generation_router: router/src/main.rs:178: Connected
2023-06-06T04:51:27.852421Z ERROR shard-manager: text_generation_launcher: Method Prefill encountered an error.
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 67, in serve
server.serve(model_id, revision, sharded, quantize, trust_remote_code, uds_path)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 155, in serve
asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code))
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.9/site-packages/grpc_interceptor/server.py", line 159, in invoke_intercept_method
return await self.intercept(
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/interceptor.py", line 20, in intercept
return await response
File "/opt/conda/lib/python3.9/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
raise error
File "/opt/conda/lib/python3.9/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 61, in Prefill
generations, next_batch = self.model.generate_token(batch)
File "/opt/conda/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/causal_lm.py", line 575, in generate_token
next_token_id, logprobs = next_token_chooser(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/tokens.py", line 71, in __call__
scores, next_logprob = self.static_warper(scores)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/logits_process.py", line 47, in __call__
self.cuda_graph = torch.cuda.CUDAGraph()
RuntimeError: CUDA error: forward compatibility was attempted on non supported HW
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handlers.
rank=0
2023-06-06T04:51:27.852917Z ERROR batch{batch_size=1}:prefill:prefill{id=0 size=1}:prefill{id=0 size=1}: text_generation_client: router/client/src/lib.rs:33: Server error: CUDA error: forward compatibility was attempted on non supported HW
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handlers.
2023-06-06T04:51:27.854001Z ERROR HTTP request{otel.name=POST /generate http.client_ip= http.flavor=1.1 http.host=10.8.0.9:8080 http.method=POST http.route=/generate http.scheme=HTTP http.target=/generate http.user_agent=node-fetch otel.kind=server trace_id=2db5d9a0888127e8a0c4bcf9c769fc9b}:generate{parameters=GenerateParameters { best_of: None, temperature: Some(0.2), repetition_penalty: None, top_k: None, top_p: Some(0.95), typical_p: None, do_sample: true, max_new_tokens: 60, return_full_text: None, stop: ["<|endoftext|>"], truncate: None, watermark: false, details: false, seed: None }}:generate{request=GenerateRequest { inputs: "<?php\n\necho \"hello world\";\n", parameters: GenerateParameters { best_of: None, temperature: Some(0.2), repetition_penalty: None, top_k: None, top_p: Some(0.95), typical_p: None, do_sample: true, max_new_tokens: 60, return_full_text: None, stop: ["<|endoftext|>"], truncate: None, watermark: false, details: false, seed: None } }}:generate_stream{request=GenerateRequest { inputs: "<?php\n\necho \"hello world\";\n", parameters: GenerateParameters { best_of: None, temperature: Some(0.2), repetition_penalty: None, top_k: None, top_p: Some(0.95), typical_p: None, do_sample: true, max_new_tokens: 60, return_full_text: None, stop: ["<|endoftext|>"], truncate: None, watermark: false, details: false, seed: None } }}:infer:send_error: text_generation_router::infer: router/src/infer.rs:533: Request failed during generation: Server error: CUDA error: forward compatibility was attempted on non supported HW
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handlers.
I have the same probleme here , did you manage to find a solution ?
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.