Hangliang Ding
Hangliang Ding
/workspace/Open-Sora/configs/pixart/inference/1x2048MS.py config在这里
https://github.com/mit-han-lab/smoothquant/blob/main/figures/migrate.jpg
I build flash_attn from source code with pytorch.2.3.0 code ``` >>> from flash_attn import flash_attn_2_cuda A100-80G cuda 12.1 ``` my env: ``` accelerate==0.29.3 aiofiles==23.2.1 aiohttp==3.9.5 aiosignal==1.3.1 altair==5.3.0 annotated-types==0.6.0 anyio==4.3.0 appdirs==1.4.4...
It seems it has this feature, but on the dashboard, I can't find where it is.
From https://github.com/FMInference/H2O/blob/281ffef3f1432ceb1a6899362d2f20e1ef13aa94/h2o_hf/utils_hh/modify_llama.py#L140-L156 If recent_budget=0: the mask set to one at first, `attn_mask = torch.ones`; but scatter to one attn_mask = attn_mask.scatter(-1, keep_topk, 1); `attn_mask[:, :-self.recent_budget] = 0` only works for...
stage3的ckpt可以开源下吗?
https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/main/opensora/models/diffusion/latte/modeling_latte.py 我理解有没有ckpt不影响 i=0 的判断。 在L506-507 这里,为什么把i=0注释掉了呢? 这样子开和关ckpt,会影响训练。
https://arxiv.org/abs/2405.07719 Is there any plan to combine two methods together to improve overall performance?
How did you solve this `[rank0]: AssertionError: You can't use same `Accelerator()` instance with multiple models when using DeepSpeed` ``` unet, optimizer, train_dataloader, lr_scheduler,KD_teacher_unet = accelerator.prepare( unet, optimizer, train_dataloader, lr_scheduler,KD_teacher_unet...
The original code has a bug when use reshape(-1) for automatic data layout inferring. Move it to fixed data layout. Failed case: train_batch_size = 1, sp_size = 4, train_sp_batch_size=1. ```...