VatsaDev

Results 88 comments of VatsaDev
trafficstars

Oh Wait you're using a manual seed, sry I misread manual seed, well that should load the batches in a certain order, so yes it could do multiple epochs in...

You don't? The max steps is an arbitrary number in the code from appearances, like the lr being set to its max value as well? Its your choice on what...

It can train a model, probably like 10-15m without OOM

16 gpus per node? wont you have 2 nodes of 8xGPU? Also what GPUs

I believe the Nanogpt supports `meta.pkl` or meta pickle files for encodings, you could train one with sentence piece.

Well as you have more diverse data, it gets harder for smaller models to perform as well, and 12 tokens is much easier to predict in comparison to 100 tokens....

They Planned Multi-Lang and then multimodal for images to text as a most likely, text to music seems unlikely

Thats a great feature, use it alot in claude. But isn't it not nearly as useful as it is with claude, because of How claude has a 100K context length,...