Swapnil Parekh
Swapnil Parekh
+1 this issue. Even using the deep speech 1 lm binary causes massive ram use
HI I would like to contribute as a first timer. Kindly give me some background on how to solve it.
hi I would like to work on this as a first timer. Can u kindly brief me on this?
Hey @yzh119 any update on sm_75 support for punica LORA kernels?
Hey @Yard1, I have addressed your comments on this soft prompt tuning PR. Some updates: - New `adapter_commons` folder with all the common code between `LoRA` and `Prompt Adapters` abstracted...
Hi @Yard1 just a friendly reminder to review this PR when you get a chance, thanks! Once this design is approved, happy to update this with support for prefix tuning...
@Yard1 no worries, thank you! Yes it should work, you can provide both PromptAdapterRequest and LoRaRequest parameters. I just tested a tiny example of this, happy to add a test...
@g-eoj thanks for the OpenAPI PR! Will look at it once I finish the refactor requested by Antoni.
@Rallio67 thanks for testing the PR out. I have tested it with bloom, llama-2 and gptbigcode and it seemed to work for me. Can you please share the lama-3 adapter...
Hey @Rallio67, the prompt works but there is a change pending which would cast the prompt to the model's dtype to make it work out of the box. Currently the...