Martin Marciniszyn Mehringer comments

Results 15 comments of


                                            Martin Marciniszyn Mehringer

Richer tab-completion

Note that the [BeakerX](https://github.com/twosigma/beakerx) Scala kernel supports this feature. Maybe some parts of their implementation could be reused. ![TabCompletion](https://user-images.githubusercontent.com/11665257/60190413-e3d9cf80-9832-11e9-9fb4-7f88551390d1.PNG)

Unnecessary assertion in cpp implementation of worldConfig.cpp

The assertion is already gone in the `main` branch.

How to pass hidden_states to llm directly, when using inflight batching?

We do not have support in the runtime for that at the moment. Is this something that could be handled inside the engine, @QiJune ?

fix: correct cudaSetDevice error when GPUs per node are fewer than their ranks in inter-node inference

I am not in favor of having function parameter defaults that change depending on the environment. These should be compile time constants. I suggest changing `run.py` instead so that it...

Incorrect GPU Assignment in MPI Inter-Node Processing with Single GPU Nodes

@Funatiq, could you please take a look at the PR?

[FeatureRequest] Gather sparse logprobs

@Marks101, the logits processor is supported on `ModelRunnerCppExecutor`: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/model_runner_cpp.py#L48 Could you try that please?

[FeatureRequest] Gather sparse logprobs

Thanks for the feedback @shangshng. It should be support in the Python bindings of the Executor API. @dcampora, could you please add support to `ModelRunnerCpp`? @Marks101, you can use the...

Missing logits in Executor API when using `return_generation_logits`

@trevor-m , could you please review @AlessioNetti's feedback?

doc: [TRTLLM-325]Integrate the NGC image in Makefile automation and document

/bot run --skip-test

doc: [TRTLLM-325]Integrate the NGC image in Makefile automation and document

/bot run