Jeremy Jahn
Jeremy Jahn
Also want this feature. Merging weight is kinda painful (merging requires large disk space).
Just wanted to share the S-LoRA paper from Stanford and found @iiLaurens has already shared! > Just noticed a paper discussing an efficient implementation of multi-LoRA serving called S-LoRA. [Link...
> @danthe3rd I also need alibi support. for now, I pass `bias = LowerTriangularMaskWithTensorBias(alibi_bias)` to `xops.memory_efficient_attention(..., attn_bias=bias )`. The forward only is ok, but failed at backward in training mode....
@wkcn Will deepspeed ZeRO 3 be supported in the future? I saw that FSDP will be supported.
looking forward to the support in vllm!
Same question. I saw `dist` in the [script](https://github.com/VITA-Group/Q-GaLore/blob/8200795c687c6ac3b6c69d595275dac0589b7f2b/q_galore_torch/q_galore_adamw8bit.py#L50). It's not imported by original [galore adam8bit](https://github.com/jiaweizzhao/GaLore/blob/master/galore_torch/adamw8bit.py).
If I remember correctly, tp_plan is for vllm integration, where tp_plan is used for inference time sharding. AutoTP is originally for deepspeed's inference engine, just like vllm. Then recently @inkcherry...
It seems that compile wrapper is removed in deepspeed 0.14.4. ref: https://github.com/microsoft/DeepSpeed/pull/5581 Is there any revamp on this PR going on?
@oraluben. Thanks for giving another PR. I thought that this old PR was merged and I was searching for line diffs that delete ds_config related code. It turns out that...