Jiaxin Shan comments

Results 193 comments of


Jiaxin Shan

Encounter the runtime error training with lora and flash_attention together

@better629 I have not tried 30b yet and we will explore 30B or 65B later and let you know the results

Can gradio application run in background ?

@Chesterguan Can you provide some logs from controller side and model worker side?

An OOM error occurred while saving in the last step of training 13B model on A100(40G) * 8.

@zhangzhengde0225 I workaround the issue by using 8*A100(80G). I got similar results like yours, the training process went smooth but the error happened during the model weight persistent. Check this...

[Roadmap] vLLM Roadmap Q2 2024

@simon-mo for prefill disaggregation. from the splitwise and distserve paper, they all build solution on top of vLLM for evaluation. Any contribution from these teams? is vLLM community open for...

Any specific optimization did in kserve to support LLM inference?

/kind question

Native integration with KEDA for LLM inference autoscaling

@kenplusplus dynamic batch size won't be controlled by external autoscaling logic. That's two different levels.

[Feature][RFC] RayCluster CRD: interface cleanup

https://github.com/ray-project/kuberay/issues/861 If we plan to make any API changes (remove any unhelpful fields), clean up is a breaking change and let's support multi version in beta version.

[Feature] Notebook centric workspace

@kevin85421 I feel it's worth having some community plugins. Core part could be lightweight and there's no conflicts.

[Feature] Improves the mkdoc site

@DmitriGekhtman partial problems are fixed. I will try to fix rest of them soon

[Feature] Validate object-store-memory before creating the cluster

@MadhavJivrajani not a problem now. It can be closed. Thanks for the follow up.