Insu Jang
Insu Jang
โฆdistribution ## ๐ Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...
### ๐ Describe the bug Hi, I am trying to implement a custom shard policy with different layer distribution, but it seems all built-in policies have the following inconsistent implementation:...
### ๐ Describe the bug **Using `LazyInitContext` and later loading checkpoint do not properly initialize model parameters.** ```python import colossalai from colossalai.lazy import LazyInitContext from colossalai.booster import Booster from colossalai.booster.plugin...
### ๐ Describe the bug 1. It seems blip2 testing doesn't work correctly at all if model is half precision (torch.float16). 2. With bfloat16, `colossalai.shardformer.layer.FusedLayerNorm` doesn't seem to work correctly....
### ๐ Describe the bug When using tensor parallelism, model parameters are sharded across GPUs to reduce its memory consumption and parallel execution. However, the optimizer still holds unsharded model...
### ๐ Describe the bug I understand that this error came out of flash attention software stack, but it seems there is no related issue except for #https://github.com/Dao-AILab/flash-attention/issues/590, therefore I...
During handling failures, if some pipeline doesn't have enough number of nodes, Oobleck is supposed to borrow nodes from other pipelines or merge pipelines. Previous implementation had a prototype implementation,...