nanoGPT How to make sentences make more sense?

Is it the amount of iterations? How do I add sense and variety to my llm?

Jun 25 '23 04:06 gitihobo

the same confusion。add data ? add layers? what's the smallest layer count？

Jun 26 '23 03:06 JKHenry520

Hope someone helps us soon

Jun 26 '23 18:06 gitihobo

how much sense do you expect? there's some ideas in the TinyStories paper: https://arxiv.org/abs/2305.07759

their dataset is here: https://huggingface.co/datasets/roneneldan/TinyStories

i have used it to pre-train and it's definitely improved the models (at expense of much compute time..)

Jun 29 '23 22:06 the-crypt-keeper

and what are we supposed to do to in terms of settings to archive similar results?

Jun 29 '23 22:06 gitihobo

This bug affects the quality negatively: https://github.com/karpathy/nanoGPT/issues/320

Jun 30 '23 10:06 Majdoddin

GPT-2 is glorified auto complete with the ability to make sentences, If you want better sentences, fine tune it. I have personally had pretty good success with finetuning gpt-2-medium into making conversation, sentences, and even small paragraphs.

Aug 22 '23 00:08 VatsaDev

So how do you finetune it?

Aug 22 '23 06:08 gitihobo

theres the Finetuning section In the readme, read that, but the command is $ python train.py config/finetune_shakespeare.py

Aug 22 '23 12:08 VatsaDev

Thank you, know I do know how to fine tune, what I am not sure about is the data, how do I get the amount of text necessary and how do I have to format it to make a good fine tune?

Aug 22 '23 20:08 gitihobo

I have addressed many of these issues in my repo NanoChatGPT, all the details are in the README. I formatted my data like this

<human> ... <endOfText>
<Bot> ... <endOfText>
<human> ... <endOfText>
<Bot> ... <endOfText>
<human> ... <endOfText>
<Bot> ... <endOfText>

since my data was conversational, I took conversation corpuses, the whole list is on my repo readme, but one dataset I found to be pretty great was the personachat dataset

Aug 23 '23 14:08 VatsaDev

nanoGPT nanoGPT copied to clipboard

How to make sentences make more sense?

nanoGPT
nanoGPT copied to clipboard