OLMo
OLMo copied to clipboard
Modeling, training, eval, and inference code for OLMo
Fixes https://github.com/allenai/OLMo/issues/457 As this is a breaking change, it probably makes sense to release a new version with it
### 🐛 Describe the bug In the HF code we use OLMo but in training it's Olmo - This creates some inconsistencies when importing from the training modeling file ----...
`hf_olmo/convert_olmo_to_hf.py` currently crashes if the YAML file in the input checkpoint refers to a local tokenizer (it tries to load the local path from HF). I added a check to...
In scritp `scripts/run_with_environment.sh`,`FS_LOCAL_RANK` is set as `RANK`. ``` export RANK=$SLURM_PROCID export FS_LOCAL_RANK=$SLURM_PROCID ``` If the job is not launched by `scripts/run_with_environment.sh` and all ranks share the same filesystem, every local...
### 🚀 The feature, motivation and pitch I am using Olmo 7B for RAG for efficient inference on low GPU resources but does not support flash attention 2.0 Here is...
### 🐛 Describe the bug model_config = AutoConfig.from_pretrained(pretrained_model_name_or_path=model_name) model = AutoModelForCausalLM.from_pretrained( model_name, config=model_config, cache_dir=cache_dir, local_files_only=False, revision=revision, trust_remote_code="True") ...... File "/opt/homebrew/Caskroom/miniconda/base/envs/whale/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 560, in from_pretrained cls.register(config.__class__, model_class, exist_ok=True) File "/opt/homebrew/Caskroom/miniconda/base/envs/whale/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line...
### ❓ The question I would like to fine-tune OLMo-1B starting from one of its checkpoints with my data. I understand that there are three steps in order to accomplish...
I use `python -m torch.distributed.run xxx` to launch the training processes. If `reduce_global_loss` is `True`, only `rank0` reduces global loss and other ranks doesn't reduce. The metrics logging to console...
This is the first of a few PRs with refactored versions of changes I have been using locally for checkpoint management. This PR focusses on my changes to the storage...