NR Wu comments

Results 8 comments of


                                            NR Wu

[BUG] try to finetune a llama 33b on 8*A100 40G, 600G RAM. But always OOM on RAM.

Before llama impl is merged in mega-ds, we implemented another llama in our private repo. And we found that U can at most train 13B llama w/o offloading with 8...

Fix `PipelineEngine.eval_batch` result

We have been working on LM recently, and encountered this problem. I am trying to fix it. @ShadenSmith @duli2012

`nn.Module` parameters allocated before warped by FSDP

In DS, large models will be allocated in zero.Init context, is there anything similar in torch FSDP? ```python with deepspeed.zero.Init(): model = MyLargeModel() ```

`nn.Module` parameters allocated before warped by FSDP

> It is not necessary to move the model to GPU before passing to FSDP: > > ``` > model = Net().to(rank) > ``` > > You only need to...

`nn.Module` parameters allocated before warped by FSDP

> FSDP has some support for deferred initialization if you look at the `param_init_fn` constructor argument, which would allow exceeding the capacity of CPU DRAM. However, the current support is...

局部变量at未初始化，是否会导致crash？

`stCoRoutineAttr_t` 有构造函数的 ```c++ stCoRoutineAttr_t() { stack_size = 128 * 1024; share_stack = NULL; } ```

No Module Named 'torch'

same problem here.

Training Example

> Any updates on this? Same question.