Insu Jang

Results 7 issues of Insu Jang

โ€ฆdistribution ## ๐Ÿ“Œ Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...

### ๐Ÿ› Describe the bug Hi, I am trying to implement a custom shard policy with different layer distribution, but it seems all built-in policies have the following inconsistent implementation:...

bug

### ๐Ÿ› Describe the bug **Using `LazyInitContext` and later loading checkpoint do not properly initialize model parameters.** ```python import colossalai from colossalai.lazy import LazyInitContext from colossalai.booster import Booster from colossalai.booster.plugin...

bug

### ๐Ÿ› Describe the bug 1. It seems blip2 testing doesn't work correctly at all if model is half precision (torch.float16). 2. With bfloat16, `colossalai.shardformer.layer.FusedLayerNorm` doesn't seem to work correctly....

bug

### ๐Ÿ› Describe the bug When using tensor parallelism, model parameters are sharded across GPUs to reduce its memory consumption and parallel execution. However, the optimizer still holds unsharded model...

bug

### ๐Ÿ› Describe the bug I understand that this error came out of flash attention software stack, but it seems there is no related issue except for #https://github.com/Dao-AILab/flash-attention/issues/590, therefore I...

bug

During handling failures, if some pipeline doesn't have enough number of nodes, Oobleck is supposed to borrow nodes from other pipelines or merge pipelines. Previous implementation had a prototype implementation,...