PicoGPT

A extremely simple toy example of a transformer-based language model.

The model and method is based on Andrew Karpathy's awsome youtube video: Let’s build GPT: from scratch, in code, spelled out.

Quick Start

Requirements:

python >= 3.7
pytorch
numpy
rich
loguru

Training a model:

python3 train.py \
	--lr=1e-3 \
	--batch-size=32 \
	--block-size=128 \   # contex block size
	--embed-size=512 \   # embedding size
	--depth=4 \          # number of transformer layers
	--num-heads=4 \      # head-size (width) of each transformer layer
	--dropout=0.1

Traning can converge on an RTX2080Ti in about 15 minutes. Run this cmd for an interactive demo

python3 chat.py

The default training dataset is Chinese classical literatures "水浒传" and "红楼梦", which can be easily changed to anything you like.

Acknowledgements

Thank you Andrew Karpathy for your excellent youtube video and the nanoGPT project.

picoGPT
picoGPT copied to clipboard

Metadata

PicoGPT

Quick Start

Acknowledgements

← Metadata

Owner

Metadata

picoGPT picoGPT copied to clipboard

Metadata

PicoGPT

Quick Start

Acknowledgements

← Metadata

Owner

Metadata

picoGPT
picoGPT copied to clipboard