oslo icon indicating copy to clipboard operation
oslo copied to clipboard

OSLO: Open Source for Large-scale Optimization

Results 17 oslo issues
Sort by recently updated
recently updated
newest added

## How to reproduce Using almost the same code from the tutorials ```python from tqdm import tqdm import os from torch.optim import Adam from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModelForSeq2SeqLM import...

It seems like that oslo currently version can not load llama model, do you have any plan to support llama model? ```shell AssertionError: Currently, LlamaForCausalLM is not supported. The current...

## Add GradScaler for ZeroOptim Numerical Precision is not tested yet. ## Description Test Script ```python import os import torch, time, gc import torch.multiprocessing as mp import torch.distributed as dist...

To apply FlashAttention * https://github.com/HazyResearch/flash-attention * https://github.com/NVIDIA/cutlass

## Describe a TODO feature - zero optimizer deparallel ## Assignees - hyein.hyun

Data Parallelism

## Describe a TODO feature - Write down a tutorial document for Expert Parallel ## Assignees - scsc0511

Expert Parallelism

## Describe a requested feature - Write test codes run with config to support models at once

## Describe a TODO feature - Force tp_wrapper do not parallelize emb-layer if model has not embedding layer. (for vision model competible) https://discord.com/channels/729741769192767510/1012603449910759504/1083785802930192434 ## Assignees - @jason9693

Hi there, I'm wondering if there is an example of how to use this repo to pretrain T5? I saw [this file](https://github.com/EleutherAI/oslo/blob/main/tests/transformers/models/mt5/test_training.py) and thought that it could maybe serve as...

## Describe a TODO feature There are two version of Zero support from ZeroDDP and ShardedModelv2 1. check the possibility to merge two into one 2. Otherwise, it has a...