text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Large Language Model Text Generation Inference

Results 639 text-generation-inference issues
Sort by recently updated
recently updated
newest added

### System Info OS version: Ubantu 18.04.1 GPU: Rtx 2080 Nvidia & Cuda: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ ### Information - [X] Docker...

### System Info **OS:** Description: Ubuntu 20.04.5 LTS Release: 20.04 Codename: focal **GPUs:** +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id...

Follows best practices and ensures easier subclassing.

# What does this PR do? For consistency and ease of use (you can just run `make` to install vllm without any extra steps). Fixes # (issue) ## Before submitting...

### System Info ``` Mon Jul 3 13:44:40 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.39.01 Driver Version: 510.39.01 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr....

### Model description RWKV: Parallelizable RNN with Transformer-level LLM Performance (pronounced as "RwaKuv", from 4 major params: R W K V) RWKV is an RNN with Transformer-level LLM performance, which...

# What does this PR do? Implements NTK-Aware scaled and dynamically scaled RoPE for the PositionRotaryEmbedding to allow models to scale beyond their default max_tokens. https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/ https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases/ Fixes # (issue)...

See https://github.com/huggingface/transformers/pull/24453. I didn't add validation to the `__init__` method since it's not done for other values/warpers.

### System Info ``` Tue Jul 4 16:51:59 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.41.03 Driver Version: 530.41.03 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr....