TensorRT-LLM ChatGLM3 6B Multi-batch Failed with Error

System Info

CPU: INTEL RPL
GPU Name: NVIDIA GTX 4090
TensorRT-LLM: tensorrt_llm==0.11.0.dev2024060400
Container Used: Yes and reproduced in Conda as well
Driver Version: 555.42.02
CUDA Version: 12.5
OS: Ubuntu 24.04
Docker Img: nvidia/cuda:12.5.0-devel-ubuntu22.04

Who can help?

@hijkzzz

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

python3 benchmarks/python/benchmark.py --engine_dir /trt_engines/chatglm3_6b/float16/1-gpu/ --dtype float16 --batch_size 16 --input_output_len "1024,512"

Expected behavior

pass with output

actual behavior

[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024061100
Allocated 770.13 MiB for execution context memory.
/usr/local/lib/python3.10/dist-packages/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at ../aten/src/ATen/NestedTensorImpl.cpp:178.)
  return _nested.nested_tensor(
[06/13/2024-03:26:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2068] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2068, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[06/13/2024-03:26:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2068] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2068, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[06/13/2024-03:26:19] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2842] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2842, condition: allInputDimensionsSpecified(routine) )
Traceback (most recent call last):
  File "/TensorRT-LLM/benchmarks/python/benchmark.py", line 416, in main
    benchmarker.run(inputs, config)
  File "/TensorRT-LLM/benchmarks/python/gpt_benchmark.py", line 254, in run
    self.decoder.decode_batch(inputs[0],
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 3240, in decode_batch
    return self.decode(input_ids,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 947, in wrapper
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 3463, in decode
    return self.decode_regular(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 3073, in decode_regular
    should_stop, next_step_tensors, tasks, context_lengths, host_context_lengths, attention_mask, context_logits, generation_logits, encoder_input_lengths = self.handle_per_step(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2732, in handle_per_step
    raise RuntimeError(f"Executing TRT engine failed step={step}!")
RuntimeError: Executing TRT engine failed step=0!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/TensorRT-LLM/benchmarks/python/benchmark.py", line 515, in <module>
    main(args)
  File "/TensorRT-LLM/benchmarks/python/benchmark.py", line 441, in main
    e.with_traceback())
TypeError: BaseException.with_traceback() takes exactly one argument (0 given)
[06/13/2024-03:26:25] [TRT-LLM] [W] Logger level already set from environment. Discard new verbosity: error
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024061100
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "/usr/lib/python3.10/multiprocessing/synchronize.py", line 110, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory

additional notes

If I run without --input_output_len, it should be ok.

Jun 13 '24 03:06 RobinJYM

Confirm this is a bug, investigating internally.

Jun 13 '24 22:06 hijkzzz

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

Jul 14 '24 01:07 github-actions[bot]

have met same error, it will happen when I set batch_size > 8

Aug 31 '24 05:08 AnnaYue

Hi @RobinJYM Could u please try the latest code base to see if issue still exist or not?

Nov 14 '24 07:11 nv-guomingz