robot-agent
robot-agent copied to clipboard
Fine-tuned LLaMa2 13B model designed for ReAct-style and Tree-Of-Thoughts style prompting.
Robot Agent
Fine-tuned Llama2 13B model designed for ReAct-style and Tree-Of-Thoughts style prompting. The codebase has the following desirable features:
- Entire training procedure runs out of the box on a single computer with 32GB of RAM and 24GB of VRAM (i.e. consumer-grade graphics cards such as the RTX 3090 and RTX 4090) with less than 30 hours of compute time.
- Carefully tuned to use no more than 27GiB of RAM and 23.6GiB of VRAM.
- This is accomplished through quantization, FP16, TF32, and the usual gradient accumulation/checkpointing settings.
- Training is fully interruptible/resumable.
- Heavily commented, short, clean, and reproducible training code.
- All library dependency versions fully pinned, base models and datasets are pinned and downloaded as part of setup process.
- After initial setup, training process does not require network access - entire project folder is portable, can be moved into airgapped and offline environments.
- Use SafeTensors everywhere for speed and security.
Technical details:
- Based on Llama2 13B.
- QLoRA training, a 128 rank LoRA similar to Guanaco.
- 2048-token context window used in supervised finetuning, 1536-token context window used in direct preference finetuning.
- Supervised finetuning using Airoboros' self-instruct dataset, generated by Airoboros' self-instruct implementation.
- The dataset has been filtered for refusals, and so could be considered "uncensored".
- The dataset generation code also uses a GPT4 jailbreak to reduce the number of refusals in the first place.
- Direct preference finetuning using Anthropic's hh-rlhf dataset
- This replaces the reward modelling and reinforcement learning steps in a standard RLHF pipeline.
- Codebase takes ideas and inspiration from StackLLaMa, QLoRA, LLaMA-TRL, Airoboros, .
Roadmap
- [x] Full reproducible environment with all datasets, base models, and dependencies included.
- [x] Supervised finetuning script using high-quality publically-available instruct datasets.
- [x] Human-preference finetuning script based on Anthropic's hh-rlhf "helpfulness" dataset.
- [x] Accidentally delete the training results on my GPU server and start the training over again from scratch.
- [ ] Fiddle with agentic dataset generation using Charades dataset.
- [ ] If that doesn't work, fiddle with video captioning using multimodal models like Otter to generate agentic captions from how-to videos on Youtube.
Prompt Format
### Human:
INSTRUCTIONS_GO_HERE
### Assistant:
Note that there is a single newline at the end of the prompt. Example:
### Human:
What color is the sky?
### Assistant:
The sky is blue.
Training
First, download everything that requires an internet connection into the current project folder. It will increase to around 30GiB in size:
make download-datasets-and-models
Next, transfer the current project folder to the training machine, where the rest of the training can be performed fully offline:
make train
Inference
To use the model, a simple chat-like interface is included for demo purposes, it's not very fancy but it's good enough for testing purposes:
make chat
Using Llama.cpp
First, run the following command to create ./exported-models/ggml-robot-agent-q5_K_M.bin
, an 8.6GiB GGML file compatible with Llama.cpp:
make generate-ggml
Now to load the model using Llama.cpp:
make chat-llama-cpp
To use Llama.cpp manually, navigate to your llama.cpp folder and start using the model with the following command (replace PATH_TO_PROJECT_FOLDER
with the path to the current project folder):
./main --model PATH_TO_PROJECT_FOLDER/exported-models/ggml-robot-agent-q5_K_M.bin --color --interactive --interactive-first --mirostat 2 --ctx-size 2048 --reverse-prompt $'\n\n### Human:\n' --prompt $'\n\n### Human:\n' --in-suffix $'\n### Assistant:\n'