Wenxuan Tan comments

Results 46 comments of


                                            Wenxuan Tan

[Remote-SSH Bug]: failed to acquire lock for install. ln permission denied

killing remote host sometimes works

Is the code for Frame Interpolation in SVD open source?

Hi, It's on our schedule, maybe in two weeks or so

[FEATURE]: Integrate GaLore into Colossalai Optimizer(Gemini/Hybrid)

I will take multiple looks

[FEATURE]: Integrate GaLore into Colossalai Optimizer(Gemini/Hybrid)

I plan to release it next week

[BUG]: AssertionError: Cannot compare difference for two sharding specs with different length.

Thanks for the issue. Could you try running that again? I pushed a fix a few weeks back

[BUG]: OOM when saving 70B model

@ver217 any insights?

[BUG]: ColossalMoE Train: AssertionError: Parameters are expected to have the same dtype `torch.bfloat16`, but got `torch.float32`

Both @ver217 and I have seen this bug, which appears when pp is off. Will dig more into it

[BUG]: llama2 hybrid_parallel or 3d giving None loss when using pp_size > 1

Hi, Could you try pulling the latest main branch? I don't have trouble running pp_size = 2.

[BUG]: llama2 hybrid_parallel or 3d giving None loss when using pp_size > 1

I think the booster should support any dataset. Have you tried replacing the random dataset with this? https://github.com/hpcaitech/ColossalAI/blob/8020f4263095373e4c7ad1b15e54b966a8ccb683/examples/language/llama2/finetune.py#L209

[BUG]: llama2 hybrid_parallel or 3d giving None loss when using pp_size > 1

Actually, in pp only the last stage computes loss, so this is not a bug. You'll need to do this to see the actual loss. Also, there's a llama fine-tuning...