Andrej
Andrej
I'm seeing slightly different results: ``` (y-y2).abs().max() 0.0078 ``` slightly unsettling. Any idea where this is from?
I also get a lot of really scary warnings from torch.compile ...
Heads up I merged a slight modification in this commit: https://github.com/karpathy/nanoGPT/commit/ae06d0b15a9111cbe2ce66b0f1be9ae29c1ecbbe Let me know if any comments
@drisspg it's much worse than that. Just running `train.py` prints: ``` compiling the model... (takes a ~minute) [2023-01-30 23:47:24,269] torch._inductor.graph: [WARNING] Creating implicit fallback for: target: aten._scaled_dot_product_efficient_attention.default args[0]: TensorBox( PermuteView(data=View(...
this was merged now so closing the issue
I don't understand what's happening here, where is the error coming from?
something can't be right here. how is tensorflow even involved?
You're right it should be, I'll issue a fix. One thing to note is that this is less of a bug than it appears to be because Adam is scale...
I only like some of these 😂