litgpt
litgpt copied to clipboard
Faster inference, Paged attention from vllm
I'm having great qualitative results from Falcon finetuned with adaptersv2.
The inference is better than what I have with huggingface/peft and lora, but still slow for scaling up.
Could the ideas or code from Paged attention https://github.com/vllm-project/vllm be used to really speed up the inference with parallel sampling and larger batch sizes?
This can be fixed by adding a state for PasteInput value property:
const [rawtext, setRawtext] = React.useState('');
and setting it:
<PasteInput
testID="composerTextInput"
ref={textInput}
value={rawtext}