Rachel Ah Chuen Monroe issues

Results 4 issues of


                                            Rachel Ah Chuen Monroe

Response caching GPU tensors

According to your docs, `only input tensors located in CPU memory will be hashable for accessing the cache. And only responses with all output tensors located in CPU memory will...

question

Segmentation fault (core dumped) - Server version 2.46.0

**Description** Currently running triton on k8s and starting Triton server version 2.46.0, we are seeing segmentation faults which causes the server to restart. It does seem to happen rather very...

question

Support dynamic path for gpt_model_path and token_dir based on Triton model repo

Hi, When using external (GCS or S3) model repo, similar to other backends, I think it would be super useful to support loading the trt engine and tokenizer from the...

LoRa weights not applied without warnings/errors when mismatch in type

### System Info We've noticed that when there's a mismatch between type of the `lora_plugin` while building the engine and the type used for the `storage-type` when calling `hf_lora_convert`, the...

bug