oslo issues

TP does not work on Flan-T5

## How to reproduce Using almost the same code from the tutorials ```python from tqdm import tqdm import os from torch.optim import Adam from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModelForSeq2SeqLM import...

shreyansh26

is there any plan to support llama model?

It seems like that oslo currently version can not load llama model， do you have any plan to support llama model? ```shell AssertionError: Currently, LlamaForCausalLM is not supported. The current...

yuyaxiong

[Feature] Add GradScaler for ZeroOptim

3

## Add GradScaler for ZeroOptim Numerical Precision is not tested yet. ## Description Test Script ```python import os import torch, time, gc import torch.multiprocessing as mp import torch.distributed as dist...

nijkah

To apply FlashAttention

1

To apply FlashAttention * https://github.com/HazyResearch/flash-attention * https://github.com/NVIDIA/cutlass

dyanos

TODO : Zero optimizer deparallel

## Describe a TODO feature - zero optimizer deparallel ## Assignees - hyein.hyun

hyeinhyun

Data Parallelism

TODO : Documentation for Expert Parallel

## Describe a TODO feature - Write down a tutorial document for Expert Parallel ## Assignees - scsc0511

scsc0511

Expert Parallelism

TODO: Test Code Enhancement

## Describe a requested feature - Write test codes run with config to support models at once

bzantium

Fix TP embedding layers

## Describe a TODO feature - Force tp_wrapper do not parallelize emb-layer if model has not embedding layer. (for vision model competible) https://discord.com/channels/729741769192767510/1012603449910759504/1083785802930192434 ## Assignees - @jason9693

jason9693

How to pretrain T5

Hi there, I'm wondering if there is an example of how to use this repo to pretrain T5? I saw [this file](https://github.com/EleutherAI/oslo/blob/main/tests/transformers/models/mt5/test_training.py) and thought that it could maybe serve as...

TristanThrush

Integration ZeroDDP and ShardedModelv2 from colossal AI

1

## Describe a TODO feature There are two version of Zero support from ZeroDDP and ShardedModelv2 1. check the possibility to merge two into one 2. Otherwise, it has a...

dongsungkim

oslo
oslo copied to clipboard

Metadata

TP does not work on Flan-T5

is there any plan to support llama model?

[Feature] Add GradScaler for ZeroOptim

To apply FlashAttention

TODO : Zero optimizer deparallel

TODO : Documentation for Expert Parallel

TODO: Test Code Enhancement

Fix TP embedding layers

How to pretrain T5

Integration ZeroDDP and ShardedModelv2 from colossal AI

← Metadata

Owner

Metadata

oslo oslo copied to clipboard

Metadata

← Metadata

Owner

Metadata

oslo
oslo copied to clipboard