punica
punica copied to clipboard
Inquiry on cuda memory across processes
Hi,
Congratulations on the great work you have done! I am very interested in your work. Specifically, I want to know how you allow multiple serving processes to share the same Cuda memory spaces (for the frozen parameters in the LoRA models).
Could you please point out the code? I want to study your implementation. Thanks!
BR//Zizhao