Andrej

Results 373 comments of Andrej

I'm seeing slightly different results: ``` (y-y2).abs().max() 0.0078 ``` slightly unsettling. Any idea where this is from?

I also get a lot of really scary warnings from torch.compile ...

Heads up I merged a slight modification in this commit: https://github.com/karpathy/nanoGPT/commit/ae06d0b15a9111cbe2ce66b0f1be9ae29c1ecbbe Let me know if any comments

@drisspg it's much worse than that. Just running `train.py` prints: ``` compiling the model... (takes a ~minute) [2023-01-30 23:47:24,269] torch._inductor.graph: [WARNING] Creating implicit fallback for: target: aten._scaled_dot_product_efficient_attention.default args[0]: TensorBox( PermuteView(data=View(...

this was merged now so closing the issue

I don't understand what's happening here, where is the error coming from?

something can't be right here. how is tensorflow even involved?

You're right it should be, I'll issue a fix. One thing to note is that this is less of a bug than it appears to be because Adam is scale...

I only like some of these 😂