llama2.c icon indicating copy to clipboard operation
llama2.c copied to clipboard

Simple text

Open dmahurin opened this issue 1 year ago • 1 comments

These changes add support for training with tinyshakesphere (change from llama2.py), and simple blank line separated text.

dmahurin avatar Jun 26 '24 00:06 dmahurin

Hello! Excuse me, I wrote a tinytext.txt of about dozens of lines. When I used

python tinyshakespeare.py 
pretokenize and python train.py --dataset=tinyshakespeare

, the following error occurred:

assert num_batches > 0, "this split is way too small? investigate." 

I just started to use llm. llama2.c can make it run on my own computer, but I don't have enough basic knowledge to quickly start training a large model of my own.

Could you please provide me with an example of a related tinytext.txt file? Thank you very much!

xpww avatar Jul 05 '24 08:07 xpww