Robot Agent

Fine-tuned Llama2 13B model designed for ReAct-style and Tree-Of-Thoughts style prompting. The codebase has the following desirable features:

Entire training procedure runs out of the box on a single computer with 32GB of RAM and 24GB of VRAM (i.e. consumer-grade graphics cards such as the RTX 3090 and RTX 4090) with less than 30 hours of compute time.
- Carefully tuned to use no more than 27GiB of RAM and 23.6GiB of VRAM.
- This is accomplished through quantization, FP16, TF32, and the usual gradient accumulation/checkpointing settings.
- Training is fully interruptible/resumable.
Heavily commented, short, clean, and reproducible training code.
- All library dependency versions fully pinned, base models and datasets are pinned and downloaded as part of setup process.
- After initial setup, training process does not require network access - entire project folder is portable, can be moved into airgapped and offline environments.
- Use SafeTensors everywhere for speed and security.

Technical details:

Based on Llama2 13B.
QLoRA training, a 128 rank LoRA similar to Guanaco.
2048-token context window used in supervised finetuning, 1536-token context window used in direct preference finetuning.
Supervised finetuning using Airoboros' self-instruct dataset, generated by Airoboros' self-instruct implementation.
- The dataset has been filtered for refusals, and so could be considered "uncensored".
- The dataset generation code also uses a GPT4 jailbreak to reduce the number of refusals in the first place.
Direct preference finetuning using Anthropic's hh-rlhf dataset
- This replaces the reward modelling and reinforcement learning steps in a standard RLHF pipeline.
Codebase takes ideas and inspiration from StackLLaMa, QLoRA, LLaMA-TRL, Airoboros, .

Roadmap

[x] Full reproducible environment with all datasets, base models, and dependencies included.
[x] Supervised finetuning script using high-quality publically-available instruct datasets.
[x] Human-preference finetuning script based on Anthropic's hh-rlhf "helpfulness" dataset.
[x] Accidentally delete the training results on my GPU server and start the training over again from scratch.
[ ] Fiddle with agentic dataset generation using Charades dataset.
[ ] If that doesn't work, fiddle with video captioning using multimodal models like Otter to generate agentic captions from how-to videos on Youtube.

Prompt Format

### Human:
INSTRUCTIONS_GO_HERE

### Assistant:

Note that there is a single newline at the end of the prompt. Example:

### Human:
What color is the sky?

### Assistant:
The sky is blue.

Training

First, download everything that requires an internet connection into the current project folder. It will increase to around 30GiB in size:

make download-datasets-and-models

Next, transfer the current project folder to the training machine, where the rest of the training can be performed fully offline:

make train

Inference

To use the model, a simple chat-like interface is included for demo purposes, it's not very fancy but it's good enough for testing purposes:

make chat

Using Llama.cpp

First, run the following command to create ./exported-models/ggml-robot-agent-q5_K_M.bin, an 8.6GiB GGML file compatible with Llama.cpp:

make generate-ggml

Now to load the model using Llama.cpp:

make chat-llama-cpp

To use Llama.cpp manually, navigate to your llama.cpp folder and start using the model with the following command (replace PATH_TO_PROJECT_FOLDER with the path to the current project folder):

./main --model PATH_TO_PROJECT_FOLDER/exported-models/ggml-robot-agent-q5_K_M.bin --color --interactive --interactive-first --mirostat 2 --ctx-size 2048 --reverse-prompt $'\n\n### Human:\n' --prompt $'\n\n### Human:\n' --in-suffix $'\n### Assistant:\n'

robot-agent
robot-agent copied to clipboard

Metadata

Robot Agent

Roadmap

Prompt Format

Training

Inference

Using Llama.cpp

← Metadata

Owner

Metadata

robot-agent robot-agent copied to clipboard

Metadata

Robot Agent

Roadmap

Prompt Format

Training

Inference

Using Llama.cpp

← Metadata

Owner

Metadata

robot-agent
robot-agent copied to clipboard