Andrej

Results 373 comments of Andrej

``` ~/llm.c/dev/cuda$ make gelu_backward /usr/bin/nvcc -O3 --use_fast_math --generate-code=arch=compute_80 ,code=[compute_80 ,sm_80 ] -lcublas -lcublasLt gelu_backward.cu -o gelu_backward nvcc fatal : Option '--generate-code arch=compute_80', missing code make: *** [Makefile:27: gelu_backward] Error 1...

I'm not amazing at Makefiles maybe there is a bit of a reasoning around what the changes here are and why they make sense?

Yeah I think we should do that. I have a TODO to look more into the NVIDIA Occupancy Calculator, which I think might be helpful here.

All looks good happy to merge. - A few stray cudaCheck(cudaGetLastError()); - A dropped print of enable_tf32 (?) - CI failed for fp16 is this expected?

Merged alternative PR to this one, closing.

- I don't love adding to requirements.txt - Or to the root of the repo Is there any way to shuttle off these files into `dev` directory and put these...

A few useful references that I found with a quick search: - https://www.reddit.com/r/MachineLearning/comments/oye64h/r_struggling_to_reproduce_perplexity_benchmarks/h7ucco2/ - https://huggingface.co/docs/transformers/perplexity - https://github.com/huggingface/transformers/issues/483 - https://github.com/openai/gpt-2/issues/78 So ideally we would reproduce the numbers in Table 3 in...

1. I'd say for now don't worry about the "mainline" code. Work entirely in the dev or doc folders, with self-contained scripts. E.g. I would take the huggingface script above...

We are abandoning WikiText103 because it's a total mess. We'll instead look at one/few of ARC Easy / Challenge, Squad, Hellaswag, TriviaQA, LAMBADA. Closing.

Great, excited to get here! The optimization is still in a bit of a flux, esp around 1) gradient accumulation and 2) gradient clipping. I want to get those in...