Insu Jang comments

Results 16 comments of


                                            Insu Jang

[BUG] when I tracking the FLOPs by FlopsProfiler, the FLOPs become greater and greater?

I have the same issue. It is not just flops, but also macs. `module.__flops__` and `module.__macs__` are calculated in post hook in profiler code: https://github.com/microsoft/DeepSpeed/blob/80f94c10c552ec79473775adb8902b210656ed76/deepspeed/profiling/flops_profiler/profiler.py#L91-L95 `module_flop_count[-1]` and `module_mac_count[-1]` have more...

[BUG]: shardformer: pipeline forward error with customized layer distribution

Submitted!

[BUG]: KeyError: 'Cache only has 0 layers, attempted to access layer with index 0' in applications/Colossal-LLaMA-2

@TongLi3701 I am facing the same problem using transformers 4.36.0 and colossalai branch `feature/update-transformers`, which targets transformers 4.36.0.

[BUG]: OOM during llama2 pretraining with flashattention and PP

@wangbluo Could you please help me solve this issue? Thanks

[BUG]: OOM during llama2 pretraining with flashattention and PP

I used 7b configuration.

[BUG] [Shardformer]: Error in blip2 testing with half precision

I am not sure if it is a bug or an unavoidable error due to lower precision and it was intended to test only on fp32. Would appreciate it if...

[BUG]: HybridParallelOptimizer holds unsharded model parameters after sharding

@Edenzzzz , thank you for your time looking into this issue. I am not sure if this fix works. I tested with `enable_all_optimization=False`, `enable_sequence_parallelism=False`, and `enable_sequence_overlap=False`, still the same problem...

[BUG]: HybridParallelOptimizer holds unsharded model parameters after sharding

Looks like `preprocess` in each policy might be the reason: https://github.com/hpcaitech/ColossalAI/blob/341263df48bbef1174c41b6c4f5f6785f895b0d4/colossalai/shardformer/policies/bert.py#L39-L51 https://github.com/hpcaitech/ColossalAI/blob/341263df48bbef1174c41b6c4f5f6785f895b0d4/colossalai/shardformer/policies/gpt2.py#L32-L43 Although all policies have the same resize logic, each model has different default vocab embedding size, so only...

[BUG]: HybridParallelOptimizer holds unsharded model parameters after sharding

A quick potential patch is not to use HF's `resize_token_embeddings` and use `nn.functional.pad` to resize tensor while avoiding recreation of `nn.Embedding` (not sure if there are other attributes that should...

[BUG]: HybridParallelOptimizer holds unsharded model parameters after sharding

Maybe it is related to #5489 ?