llm-foundry
llm-foundry copied to clipboard
LLM training code for Databricks foundation models
Hi teams, I'm fine-tuning with 6 V100 GPUs. The fine-tuning process is extremely slow for me. I'm using fp16 and attn_impl: torch, with a global_train_batch_size of 12 and device_train_microbatch_size automatically...
I fork triton and rename it at `triton_pre_mlri`, triton diff [here](https://github.com/openai/triton/compare/main...vchiley:triton:triton_pre_mlir) llmfoundry/models/layers/flash_attn_triton.py is copy pasta from [HazyResearch flash_attn_triton](https://github.com/HazyResearch/flash-attention/blob/main/flash_attn/flash_attn_triton.py) where I modify imports to be ``` import triton_pre_mlir as triton import...
Docker Image with CUDA 12.1 for ADA Gen cards
## ❓ Question I want to fine-tune the model with SageMaker. Is there a guide how to do it? I have a dataset that I want to fine-tune the model...
I'm observe my optimizer metrics while mpt trains, and some blocks are infs e.g. `Train cosine/update_grad/model._fsdp_wrapped_module.transformer.blocks.9._fsdp_wrapped_module.ffn.down_proj.weight: inf` It's ok or why this happens? Do you know this issue? I guess...
I am trying to convert Redpajama-github dataset to streaming format but getting the error as below. To replicate: python llm-foundry/scripts/data_prep/convert_dataset_json.py \ --path github/split1 \ --out_root github/split1 --split train \ --concat_tokens...
## ❓ Question ## Additional context I'm confused with demo of https://huggingface.co/spaces/mosaicml/mpt-7b-instruct and github script inference/hf_chat.py the later seems stupid .... for example please write a java function to query...
Hello! I hope you are doing well! I've requested access twice but I didn't get any answer or feedback Please, provide some email which I can use to re-send again...
Adds an 8-bit version of the LION optimizer. Some non-obvious aspects of this include: - CUDA kernels for int8 quantizing and dequantizing floats. Kernels use numba since I got stonewalled...
Support remote jsonl files for finetuning.