Andrej comments

Results 373 comments of


                                            Andrej

The training data may cross boundary between different articles?

Yes this is the typical training regime. There is a special END OF TEXT token separating them, so the model is expected to learn that this token separates unrelated documents.

Add an option to align data loads to block_size boundaries.

Can you explain more? Why does this improve compute efficiency?

attn_mask for inference

So - you're right about your concerns, but not exactly. I spent much less time on nanoGPT from inference standpoint. Calculating and passing in attention mask is one way to...

Is there a big difference in the model quality between a loss of 3.0 and 2.9?

Yes, big difference for any additional loss, the farther and farther you get in training.

Is there a big difference in the model quality between a loss of 3.0 and 2.9?

The first few amounts of loss are just the most boring things, like learning that sentences end with ".", and that spaces are important. All the interesting stuff gets learned...

Is this possible to produce text with few shot learning?

At the scale of nanoGPT basically the answer is no. ICL (in context learning) emerges a few B parameters down the road.

Changes to support packaging

This commit does two things: the thing you mentioned but also it introduces new variables for train/val paths...

The input Shakespeare file does not contain the entire Shakespeare

Yeah apparently it isn't all of Shakespeare. Silly but I wasn't aware of it, or more likely I forgot that by now :D. Would love the full works of Shakespeare...

Add two popular datasets for character level LM

That's nice, but prefer we keep `n_layer_update` separate

Minbpe as a potential course

I don't know I don't really like these platforms too much and they usually irritate me with dark patterns when I stop by. I don't want to sign up for...