lorax icon indicating copy to clipboard operation
lorax copied to clipboard

an illegal memory access was encountered for Mixtral with 1700 tokens

Open hayleyhu opened this issue 1 year ago • 3 comments

System Info

Mixtral model 4 A100 GPUs, each 80G

Information

  • [X] Docker
  • [ ] The CLI directly

Tasks

  • [X] An officially supported command
  • [ ] My own modifications

Reproduction

I am running Mixtral model in 4 shards and got transport error when prompt size is 1731.

sudo docker run --gpus='"device=4,5,6,7"' --shm-size 1g -p 8080:80 -v $PWD/data:/data \
    ghcr.io/predibase/lorax:latest --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --num-shard 4 --sharded true \
--max-input-length 4095 \
   --max-total-tokens 3000\
   --max-batch-prefill-tokens 4096\
   --waiting-served-ratio 1.2 \
   --max-waiting-tokens 20 \
   --max-stop-sequences 10 \
   --cuda-memory-fraction 0.99

client

from lorax import Client
client = Client("http://127.0.0.1:8080")
prompt="""Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say
that they were perfectly normal, thank you very much. They were the last
people you'd expect to be involved in anything strange or mysterious,
because they just didn't hold with such nonsense.

Mr. Dursley was the director of a firm called Grunnings, which made
drills. He was a big, beefy man with hardly any neck, although he did
have a very large mustache. Mrs. Dursley was thin and blonde and had
nearly twice the usual amount of neck, which came in very useful as she
spent so much of her time craning over garden fences, spying on the
neighbors. The Dursleys had a small son called Dudley and in their
opinion there was no finer boy anywhere.

The Dursleys had everything they wanted, but they also had a secret, and
their greatest fear was that somebody would discover it. They didn't
think they could bear it if anyone found out about the Potters. Mrs.
Potter was Mrs. Dursley's sister, but they hadn't met for several years;
in fact, Mrs. Dursley pretended she didn't have a sister, because her
sister and her good-for-nothing husband were as unDursleyish as it was
possible to be. The Dursleys shuddered to think what the neighbors would
say if the Potters arrived in the street. The Dursleys knew that the
Potters had a small son, too, but they had never even seen him. This boy
was another good reason for keeping the Potters away; they didn't want
Dudley mixing with a child like that.

When Mr. and Mrs. Dursley woke up on the dull, gray Tuesday our story
starts, there was nothing about the cloudy sky outside to suggest that
strange and mysterious things would soon be happening all over the
country. Mr. Dursley hummed as he picked out his most boring tie for
work, and Mrs. Dursley gossiped away happily as she wrestled a screaming
Dudley into his high chair.

None of them noticed a large, tawny owl flutter past the window.

At half past eight, Mr. Dursley picked up his briefcase, pecked Mrs.
Dursley on the cheek, and tried to kiss Dudley good-bye but missed,
because Dudley was now having a tantrum and throwing his cereal at the
walls. "Little tyke," chortled Mr. Dursley as he left the house. He got
into his car and backed out of number four's drive.

It was on the corner of the street that he noticed the first sign of
something peculiar -- a cat reading a map. For a second, Mr. Dursley
didn't realize what he had seen -- then he jerked his head around to
look again. There was a tabby cat standing on the corner of Privet
Drive, but there wasn't a map in sight. What could he have been thinking
of? It must have been a trick of the light. Mr. Dursley blinked and
stared at the cat. It stared back. As Mr. Dursley drove around the
corner and up the road, he watched the cat in his mirror. It was now
reading the sign that said Privet Drive -- no, looking at the sign; cats
couldn't read maps or signs. Mr. Dursley gave himself a little shake and
put the cat out of his mind. As he drove toward town he thought of
nothing except a large order of drills he was hoping to get that day.

But on the edge of town, drills were driven out of his mind by something
else. As he sat in the usual morning traffic jam, he couldn't help
noticing that there seemed to be a lot of strangely dressed people
about. People in cloaks. Mr. Dursley couldn't bear people who dressed in
funny clothes -- the getups you saw on young people! He supposed this
was some stupid new fashion. He drummed his fingers on the steering
wheel and his eyes fell on a huddle of these weirdos standing quite
close by. They were whispering excitedly together. Mr. Dursley was
enraged to see that a couple of them weren't young at all; why, that man
had to be older than he was, and wearing an emerald-green cloak! The
nerve of him! But then it struck Mr. Dursley that this was probably some
silly stunt -- these people were obviously collecting for something...
yes, that would be it. The traffic moved on and a few minutes later, Mr.
Dursley arrived in the Grunnings parking lot, his mind back on drills.

Mr. Dursley always sat with his back to the window in his office on the
ninth floor. If he hadn't, he might have found it harder to concentrate
on drills that morning. He didn't see the owls swoop ing past in broad
daylight, though people down in the street did; they pointed and gazed
open- mouthed as owl after owl sped overhead. Most of them had never
seen an owl even at nighttime. Mr. Dursley, however, had a perfectly
normal, owl-free morning. He yelled at five different people. He made
several important telephone calls and shouted a bit more. He was in a
very good mood until lunchtime, when he thought he'd stretch his legs
and walk across the road to buy himself a bun from the bakery."""
print(client.generate(prompt, max_new_tokens=20, stop_sequences=["\n\n"]).generated_text)

Errors

  >>> print(client.generate(prompt, max_new_tokens=20, stop_sequences=["\n\n"]).generated_text)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/hayley/lorax/.venv/lib/python3.8/site-packages/lorax/client.py", line 192, in generate
    raise parse_error(resp.status_code, payload)
lorax.errors.GenerationError: Request failed during generation: Server error: Unexpected <class 'RuntimeError'>: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.



  File "/opt/conda/bin/lorax-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 89, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 324, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/opt/conda/lib/
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 922, in forward
    hidden_states, residual = layer(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 868, in forward
    moe_output = self.moe(normed_attn_res_output)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 718, in forward
    return self.sparse_forward(x)
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 616, in sparse_forward
    x = ops.padded_gather(x, indices, bin_ids, bins, padded_bins,
  File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/opt/conda/lib/python3.10/site-packages/stk/backend/autocast.py", line 28, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/megablocks/ops/padded_gather.py", line 14, in forward
    return kernels.padded_gather(
  File "/opt/conda/lib/python3.10/site-packages/megablocks/backend/kernels.py", line 118, in padded_gather
    output_rows = padded_bins[-1].cpu().item()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
    return await self.intercept(
> File "/opt/conda/lib/python3.10/site-packages/lorax_server/interceptor.py", line 38, in intercept
    return await response
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
    raise error
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
    return await behavior(request_or_iterator, context)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 96, in Prefill
    generations, next_batch = self.model.generate_token(batch)
  File "/opt/conda/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 927, in generate_token
    raise e
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 924, in generate_token
    out = self.forward(batch, adapter_data)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_mixtral.py", line 430, in forward
    logits = model.forward(
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 979, in forward
    hidden_states = self.model(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 922, in forward
    hidden_states, residual = layer(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 868, in forward
    moe_output = self.moe(normed_attn_res_output)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 718, in forward
    return self.sparse_forward(x)
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_mixtral_modeling.py", line 616, in sparse_forward
    x = ops.padded_gather(x, indices, bin_ids, bins, padded_bins,
  File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/opt/conda/lib/python3.10/site-packages/stk/backend/autocast.py", line 28, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/megablocks/ops/padded_gather.py", line 14, in forward
    return kernels.padded_gather(
  File "/opt/conda/lib/python3.10/site-packages/megablocks/backend/kernels.py", line 118, in padded_gather
    output_rows = padded_bins[-1].cpu().item()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


2024-03-29T23:35:12.011139Z ERROR batch{batch_size=1}:prefill:prefill{id=0 size=1}:prefill{id=0 size=1}: lorax_client: router/client/src/lib.rs:34: Server error: Unexpected <class 'RuntimeError'>: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

2024-03-29T23:35:12.011202Z ERROR batch{batch_size=1}:prefill:prefill{id=0 size=1}:prefill{id=0 size=1}: lorax_client: router/client/src/lib.rs:34: Server error: Unexpected <class 'RuntimeError'>: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

2024-03-29T23:35:12.553846Z ERROR shard-manager: lorax_launcher: Shard complete standard error output:

[W Utils.hpp:133] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)
Warmup to max_total_tokens: 100%|██████████| 1096/1096 [00:48<00:00, 22.76it/s]
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f23edaced87 in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f23eda7f75f in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f23edb9f8a8 in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x1d40e (0x7f23edb6a40e in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #4: <unknown function> + 0x1f744 (0x7f23edb6c744 in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #5: <unknown function> + 0x1fb6d (0x7f23edb6cb6d in /opt/conda/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #6: <unknown function> + 0x540210 (0x7f23ecaf7210 in /opt/conda/lib/python3.10/site-package
frame #14: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #15: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #16: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #17: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #18: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #19: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #20: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #21: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #22: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #23: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #24: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #25: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #26: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #27: <unknown function> + 0x14dbd3 (0x55ce51811bd3 in /opt/conda/bin/python3.10)
frame #28: <unknown function> + 0x15262b (0x55ce5181662b in /opt/conda/bin/python3.10)
frame #29: <unknown function> + 0x1525e7 (0x55ce518165e7 in /opt/conda/bin/python3.10)
frame #30: <unknown function> + 0x563095 (0x7f21d7059095 in /opt/conda/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-x86_64-linux-gnu.so)
frame #31: <unknown function> + 0x56b815 (0x7f21d7061815 in /opt/conda/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-x86_64-linux-gnu.so)
frame #32: <unknown function> + 0x60ae0f (0x7f21d7100e0f in /opt/conda/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-x86_64-linux-gnu.so)
frame #33: <unknown function> + 0x56a20b (0x7f21d706020b in /opt/conda/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-x86_64-linux-gnu.so)
frame #34: <unknown function> + 0x5cbc29 (0x7f21d70c1c29 in /opt/conda/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-x86_64-linux-gnu.so)
frame #35: <unknown function> + 0x14f3bd (0x55ce518133bd in /opt/conda/bin/python3.10)
frame #36: PyObject_VectorcallMethod + 0x85 (0x55ce51822c85 in /opt/conda/bin/python3.10)
frame #37: <unknown function> + 0xae1eb (0x55ce517721eb in /opt/conda/bin/python3.10)
frame #38: <unknown function> + 0x7bf6 (0x7f23ee5d0bf6 in /opt/conda/lib/python3.10/lib-dynload/_asyncio.cpython-310-x86_64-linux-gnu.so)
frame #39: <unknown function> + 0x143d2a (0x55ce51807d2a in /opt/conda/bin/python3.10)
frame #40: <unknown function> + 0x25f22c (0x55ce5192322c in /opt/conda/bin/python3.10)
frame #41: <unknown function> + 0xfda7b (0x55ce517c1a7b in /opt/conda/bin/python3.10)
frame #42: <unknown function> + 0x13c1b3 (0x55ce518001b3 in /opt/conda/bin/python3.10)
frame #43: _PyEval_EvalFrameDefault + 0x5d5d (0x55ce517fe16d in /opt/conda/bin/python3.10)
frame #44: _PyFunction_Vectorcall + 0x6c (0x55ce518088cc in /opt/conda/bin/python3.10)
frame #45: _PyEval_EvalFrameDefault + 0x72c (0x55ce517f8b3c in /opt/conda/bin/python3.10)
frame #46: _PyFunction_Vectorcall + 0x6c (0x55ce518088cc in /opt/conda/bin/python3.10)
frame #47: _PyEval_EvalFrameDefault + 0x72c (0x55ce517f8b3c in /opt/conda/bin/python3.10)
frame #48: _PyFunction_Vectorcall + 0x6c (0x55ce518088cc in /opt/conda/bin/python3.10)
frame #49: _PyEval_EvalFrameDefault + 0x72c (0x55ce517f8b3c in /opt/conda/bin/python3.10)
frame #50: _PyFunction_Vectorcall + 0x6c (0x55ce518088cc in /opt/conda/bin/python3.10)
frame #51: _PyEval_EvalFrameDefault + 0x72c (0x55ce517f8b3c in /opt/conda/bin/python3.10)
frame #52: _PyFunction_Vectorcall + 0x6c (0x55ce518088cc in /opt/conda/bin/python3.10)
frame #53: _PyEval_EvalFrameDefault + 0x4c12 (0x55ce517fd022 in /opt/conda/bin/python3.10)
frame #54: _PyFunction_Vectorcall + 0x6c (0x55ce518088cc in /opt/conda/bin/python3.10)
frame #55: _PyEval_EvalFrameDefault + 0x4c12 (0x55ce517fd022 in /opt/conda/bin/python3.10)
frame #56: _PyFunction_Vectorcall + 0x6c (0x55ce518088cc in /opt/conda/bin/python3.10)
frame #57: PyObject_Call + 0xbc (0x55ce51814d9c in /opt/conda/bin/python3.10)
frame #58: _PyEval_EvalFrameDefault + 0x2d84 (0x55ce517fb194 in /opt/conda/bin/python3.10)
frame #59: _PyFunction_Vectorcall + 0x6c (0x55ce518088cc in /opt/conda/bin/python3.10)
frame #60: PyObject_Call + 0xbc (0x55ce51814d9c in /opt/conda/bin/python3.10)
frame #61: _PyEval_EvalFrameDefault + 0x2d84 (0x55ce517fb194 in /opt/conda/bin/python3.10)
frame #62: <unknown function> + 0x150402 (0x55ce51814402 in /opt/conda/bin/python3.10)
frame #63: PyObject_Call + 0xbc (0x55ce51814d9c in /opt/conda/bin/python3.10)
 rank=0
2024-03-29T23:35:12.553943Z ERROR shard-manager: lorax_launcher: Shard process was signaled to shutdown with signal 6 rank=0
2024-03-29T23:35:12.564550Z ERROR batch{batch_size=1}:prefill:prefill{id=0 size=1}:prefill{id=0 size=1}: lorax_client: router/client/src/lib.rs:34: Server error: transport error
2024-03-29T23:35:12.589373Z ERROR shard-manager: lorax_launcher: Shard complete standard error output:

Expected behavior

Return responses

hayleyhu avatar Mar 29 '24 23:03 hayleyhu

I also faced the same issue

markovalexander avatar Apr 08 '24 13:04 markovalexander

same problem here

prd-tuong-nguyen avatar Apr 12 '24 02:04 prd-tuong-nguyen

@magdyksaleh Hello! Any updates on this?

markovalexander avatar Apr 22 '24 10:04 markovalexander