lorax
lorax copied to clipboard
an illegal memory access was encountered for Mixtral with 1700 tokens
System Info
Mixtral model 4 A100 GPUs, each 80G
Information
- [X] Docker
- [ ] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
I am running Mixtral model in 4 shards and got transport error when prompt size is 1731.
sudo docker run --gpus='"device=4,5,6,7"' --shm-size 1g -p 8080:80 -v $PWD/data:/data \
ghcr.io/predibase/lorax:latest --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --num-shard 4 --sharded true \
--max-input-length 4095 \
--max-total-tokens 3000\
--max-batch-prefill-tokens 4096\
--waiting-served-ratio 1.2 \
--max-waiting-tokens 20 \
--max-stop-sequences 10 \
--cuda-memory-fraction 0.99
client
from lorax import Client
client = Client("http://127.0.0.1:8080")
prompt="""Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say
that they were perfectly normal, thank you very much. They were the last
people you'd expect to be involved in anything strange or mysterious,
because they just didn't hold with such nonsense.
Mr. Dursley was the director of a firm called Grunnings, which made
drills. He was a big, beefy man with hardly any neck, although he did
have a very large mustache. Mrs. Dursley was thin and blonde and had
nearly twice the usual amount of neck, which came in very useful as she
spent so much of her time craning over garden fences, spying on the
neighbors. The Dursleys had a small son called Dudley and in their
opinion there was no finer boy anywhere.
The Dursleys had everything they wanted, but they also had a secret, and
their greatest fear was that somebody would discover it. They didn't
think they could bear it if anyone found out about the Potters. Mrs.
Potter was Mrs. Dursley's sister, but they hadn't met for several years;
in fact, Mrs. Dursley pretended she didn't have a sister, because her
sister and her good-for-nothing husband were as unDursleyish as it was
possible to be. The Dursleys shuddered to think what the neighbors would
say if the Potters arrived in the street. The Dursleys knew that the
Potters had a small son, too, but they had never even seen him. This boy
was another good reason for keeping the Potters away; they didn't want
Dudley mixing with a child like that.
When Mr. and Mrs. Dursley woke up on the dull, gray Tuesday our story
starts, there was nothing about the cloudy sky outside to suggest that
strange and mysterious things would soon be happening all over the
country. Mr. Dursley hummed as he picked out his most boring tie for
work, and Mrs. Dursley gossiped away happily as she wrestled a screaming
Dudley into his high chair.
None of them noticed a large, tawny owl flutter past the window.
At half past eight, Mr. Dursley picked up his briefcase, pecked Mrs.
Dursley on the cheek, and tried to kiss Dudley good-bye but missed,
because Dudley was now having a tantrum and throwing his cereal at the
walls. "Little tyke," chortled Mr. Dursley as he left the house. He got
into his car and backed out of number four's drive.
It was on the corner of the street that he noticed the first sign of
something peculiar -- a cat reading a map. For a second, Mr. Dursley
didn't realize what he had seen -- then he jerked his head around to
look again. There was a tabby cat standing on the corner of Privet
Drive, but there wasn't a map in sight. What could he have been thinking
of? It must have been a trick of the light. Mr. Dursley blinked and
stared at the cat. It stared back. As Mr. Dursley drove around the
corner and up the road, he watched the cat in his mirror. It was now
reading the sign that said Privet Drive -- no, looking at the sign; cats
couldn't read maps or signs. Mr. Dursley gave himself a little shake and
put the cat out of his mind. As he drove toward town he thought of
nothing except a large order of drills he was hoping to get that day.
But on the edge of town, drills were driven out of his mind by something
else. As he sat in the usual morning traffic jam, he couldn't help
noticing that there seemed to be a lot of strangely dressed people
about. People in cloaks. Mr. Dursley couldn't bear people who dressed in
funny clothes -- the getups you saw on young people! He supposed this
was some stupid new fashion. He drummed his fingers on the steering
wheel and his eyes fell on a huddle of these weirdos standing quite
close by. They were whispering excitedly together. Mr. Dursley was
enraged to see that a couple of them weren't young at all; why, that man
had to be older than he was, and wearing an emerald-green cloak! The
nerve of him! But then it struck Mr. Dursley that this was probably some
silly stunt -- these people were obviously collecting for something...
yes, that would be it. The traffic moved on and a few minutes later, Mr.
Dursley arrived in the Grunnings parking lot, his mind back on drills.
Mr. Dursley always sat with his back to the window in his office on the
ninth floor. If he hadn't, he might have found it harder to concentrate
on drills that morning. He didn't see the owls swoop ing past in broad
daylight, though people down in the street did; they pointed and gazed
open- mouthed as owl after owl sped overhead. Most of them had never
seen an owl even at nighttime. Mr. Dursley, however, had a perfectly
normal, owl-free morning. He yelled at five different people. He made
several important telephone calls and shouted a bit more. He was in a
very good mood until lunchtime, when he thought he'd stretch his legs
and walk across the road to buy himself a bun from the bakery."""
print(client.generate(prompt, max_new_tokens=20, stop_sequences=["\n\n"]).generated_text)
Errors
>>> print(client.generate(prompt, max_new_tokens=20, stop_sequences=["\n\n"]).generated_text)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/hayley/lorax/.venv/lib/python3.8/site-packages/lorax/client.py", line 192, in generate
raise parse_error(resp.status_code, payload)
lorax.errors.GenerationError: Request failed during generation: Server error: Unexpected <class 'RuntimeError'>: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
File "/opt/conda/bin/lorax-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 89, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 324, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 922, in forward
hidden_states, residual = layer(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 868, in forward
moe_output = self.moe(normed_attn_res_output)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 718, in forward
return self.sparse_forward(x)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 616, in sparse_forward
x = ops.padded_gather(x, indices, bin_ids, bins, padded_bins,
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/opt/conda/lib/python3.10/site-packages/stk/backend/autocast.py", line 28, in decorate_fwd
return fwd(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/megablocks/ops/padded_gather.py", line 14, in forward
return kernels.padded_gather(
File "/opt/conda/lib/python3.10/site-packages/megablocks/backend/kernels.py", line 118, in padded_gather
output_rows = padded_bins[-1].cpu().item()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(
> File "/opt/conda/lib/python3.10/site-packages/lorax_server/interceptor.py", line 38, in intercept
return await response
File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
raise error
File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 96, in Prefill
generations, next_batch = self.model.generate_token(batch)
File "/opt/conda/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 927, in generate_token
raise e
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 924, in generate_token
out = self.forward(batch, adapter_data)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_mixtral.py", line 430, in forward
logits = model.forward(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 979, in forward
hidden_states = self.model(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 922, in forward
hidden_states, residual = layer(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 868, in forward
moe_output = self.moe(normed_attn_res_output)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 718, in forward
return self.sparse_forward(x)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 616, in sparse_forward
x = ops.padded_gather(x, indices, bin_ids, bins, padded_bins,
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/opt/conda/lib/python3.10/site-packages/stk/backend/autocast.py", line 28, in decorate_fwd
return fwd(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/megablocks/ops/padded_gather.py", line 14, in forward
return kernels.padded_gather(
File "/opt/conda/lib/python3.10/site-packages/megablocks/backend/kernels.py", line 118, in padded_gather
output_rows = padded_bins[-1].cpu().item()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2024-03-29T23:35:12.011139Z ERROR batch{batch_size=1}:prefill:prefill{id=0 size=1}:prefill{id=0 size=1}: lorax_client: router/client/src/lib.rs:34: Server error: Unexpected <class 'RuntimeError'>: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2024-03-29T23:35:12.011202Z ERROR batch{batch_size=1}:prefill:prefill{id=0 size=1}:prefill{id=0 size=1}: lorax_client: router/client/src/lib.rs:34: Server error: Unexpected <class 'RuntimeError'>: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2024-03-29T23:35:12.553846Z ERROR shard-manager: lorax_launcher: Shard complete standard error output:
[W Utils.hpp:133] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)
Warmup to max_total_tokens: 100%|██████████| 1096/1096 [00:48<00:00, 22.76it/s]
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f23edaced87 in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f23eda7f75f in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f23edb9f8a8 in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x1d40e (0x7f23edb6a40e in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #4: <unknown function> + 0x1f744 (0x7f23edb6c744 in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #5: <unknown function> + 0x1fb6d (0x7f23edb6cb6d in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #6: <unknown function> + 0x540210 (0x7f23ecaf7210 in /opt/conda/lib/python3.10/site-package
frame #14: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #15: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #16: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #17: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #18: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #19: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #20: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #21: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #22: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #23: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #24: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #25: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #26: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #27: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #28: <unknown function> + 0x15262b (0x55ce5181662b in /opt/conda/bin/python3.10)
frame #29: <unknown function> + 0x1525e7 (0x55ce518165e7 in /opt/conda/bin/python3.10)
frame #30: <unknown function> + 0x563095 (0x7f21d7059095 in /opt/conda/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-x86_64-linux-gnu.so)
frame #31: <unknown function> + 0x56b815 (0x7f21d7061815 in /opt/conda/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-x86_64-linux-gnu.so)
frame #32: <unknown function> + 0x60ae0f (0x7f21d7100e0f in /opt/conda/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-x86_64-linux-gnu.so)
frame #33: <unknown function> + 0x56a20b (0x7f21d706020b in /opt/conda/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-x86_64-linux-gnu.so)
frame #34: <unknown function> + 0x5cbc29 (0x7f21d70c1c29 in /opt/conda/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-x86_64-linux-gnu.so)
frame #35: <unknown function> + 0x14f3bd (0x55ce518133bd in /opt/conda/bin/python3.10)
frame #36: PyObject_VectorcallMethod + 0x85 (0x55ce51822c85 in /opt/conda/bin/python3.10)
frame #37: <unknown function> + 0xae1eb (0x55ce517721eb in /opt/conda/bin/python3.10)
frame #38: <unknown function> + 0x7bf6 (0x7f23ee5d0bf6 in /opt/conda/lib/python3.10/lib-dynload/_asyncio.cpython-310-x86_64-linux-gnu.so)
frame #39: <unknown function> + 0x143d2a (0x55ce51807d2a in /opt/conda/bin/python3.10)
frame #40: <unknown function> + 0x25f22c (0x55ce5192322c in /opt/conda/bin/python3.10)
frame #41: <unknown function> + 0xfda7b (0x55ce517c1a7b in /opt/conda/bin/python3.10)
frame #42: <unknown function> + 0x13c1b3 (0x55ce518001b3 in /opt/conda/bin/python3.10)
frame #43: _PyEval_EvalFrameDefault + 0x5d5d (0x55ce517fe16d in /opt/conda/bin/python3.10)
frame #44: _PyFunction_Vectorcall + 0x6c (0x55ce518088cc in /opt/conda/bin/python3.10)
frame #45: _PyEval_EvalFrameDefault + 0x72c (0x55ce517f8b3c in /opt/conda/bin/python3.10)
frame #46: _PyFunction_Vectorcall + 0x6c (0x55ce518088cc in /opt/conda/bin/python3.10)
frame #47: _PyEval_EvalFrameDefault + 0x72c (0x55ce517f8b3c in /opt/conda/bin/python3.10)
frame #48: _PyFunction_Vectorcall + 0x6c (0x55ce518088cc in /opt/conda/bin/python3.10)
frame #49: _PyEval_EvalFrameDefault + 0x72c (0x55ce517f8b3c in /opt/conda/bin/python3.10)
frame #50: _PyFunction_Vectorcall + 0x6c (0x55ce518088cc in /opt/conda/bin/python3.10)
frame #51: _PyEval_EvalFrameDefault + 0x72c (0x55ce517f8b3c in /opt/conda/bin/python3.10)
frame #52: _PyFunction_Vectorcall + 0x6c (0x55ce518088cc in /opt/conda/bin/python3.10)
frame #53: _PyEval_EvalFrameDefault + 0x4c12 (0x55ce517fd022 in /opt/conda/bin/python3.10)
frame #54: _PyFunction_Vectorcall + 0x6c (0x55ce518088cc in /opt/conda/bin/python3.10)
frame #55: _PyEval_EvalFrameDefault + 0x4c12 (0x55ce517fd022 in /opt/conda/bin/python3.10)
frame #56: _PyFunction_Vectorcall + 0x6c (0x55ce518088cc in /opt/conda/bin/python3.10)
frame #57: PyObject_Call + 0xbc (0x55ce51814d9c in /opt/conda/bin/python3.10)
frame #58: _PyEval_EvalFrameDefault + 0x2d84 (0x55ce517fb194 in /opt/conda/bin/python3.10)
frame #59: _PyFunction_Vectorcall + 0x6c (0x55ce518088cc in /opt/conda/bin/python3.10)
frame #60: PyObject_Call + 0xbc (0x55ce51814d9c in /opt/conda/bin/python3.10)
frame #61: _PyEval_EvalFrameDefault + 0x2d84 (0x55ce517fb194 in /opt/conda/bin/python3.10)
frame #62: <unknown function> + 0x150402 (0x55ce51814402 in /opt/conda/bin/python3.10)
frame #63: PyObject_Call + 0xbc (0x55ce51814d9c in /opt/conda/bin/python3.10)
rank=0
2024-03-29T23:35:12.553943Z ERROR shard-manager: lorax_launcher: Shard process was signaled to shutdown with signal 6 rank=0
2024-03-29T23:35:12.564550Z ERROR batch{batch_size=1}:prefill:prefill{id=0 size=1}:prefill{id=0 size=1}: lorax_client: router/client/src/lib.rs:34: Server error: transport error
2024-03-29T23:35:12.589373Z ERROR shard-manager: lorax_launcher: Shard complete standard error output:
Expected behavior
Return responses
I also faced the same issue
same problem here
@magdyksaleh Hello! Any updates on this?