avianion issues

Results 13 issues of


                                            avianion

Smaller lists?

The list is great, but not for use in production. If you want to use it live, it's a huge 2.4mb file - which even gzipped slows down ones site....

handling errors in the streaming response

### Description When handling errors in the stream, it breaks the entire stream and gives an error coming. Is there a better way to do this? Because it removes the...

question

Doesn't work as V2 signing was deprecated

I loved this module, but it no longer works as only V4 signing works now. Is it possible for someone to update this to V4 signing? Thanks so much!

increase chunk size for streaming with tensorrtllm_backend

Is it possible to increase the amount of tokens sent per chunk during the streaming process and how to do so? This could also be with triton-inference-server

support for llama 3

will this project plan to support llama 3 70b or 8b?

llama 3 tokenizer no longer works - updated eos token

The official llama 3 70b instruct repo has updated the eos token "eos_token": "", Yet when using this library and using that eos token, no output is outputted because it...

Increase chunk size while streaming

Is it possible to increase the amount of tokens sent per chunk during the streaming process and how to do so? This could also be with triton-inference-server

question

triaged

Implement XC-Cache to improve long context inference performance

https://arxiv.org/abs/2404.15420 "In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference information. Just-in-time processing of a context is inefficient due to the quadratic cost of...

bug: Tokens per second calculation is wrong.

**Describe the bug** Tokens per second is currently calculated including the latency since the beginning of the API request and or hitting the start button. However, tokens per second should...

type: bug

No 24.05-trtllm-python-py3 in NGC Repo

All other images like VLLM etc are available, but not this one. What gives? 24.05-py3-min 24.05-py3-sdk 24.05-pyt-python-py3 24.05-tf2-python-py3 24.05-vllm-python-py3 24.05-py3 24.05-py3-igpu-min 24.05-py3-igpu-sdk 24.05-py3-igpu But no 24.05-trtllm-python-py3 ??