Andrej
Andrej
Hi, you read the score back from METEOR jar on this line: https://github.com/tylin/coco-caption/blob/master/pycocoevalcap/meteor/meteor.py#L68 but I believe METEOR wants to give the score twice (and should be read twice), because it...
I'm trying @williamFalcon , but I have somewhat mixed feelings about it. The API are now matched up and I can train the basic loop with either: ```bash $ USE_LIGHTNING=0...
I wasn't diligent enough with my use of `float*` vs `const float*`. The former is used for outputs, the latter for inputs, to all functions. We'd like to refactor the...
I have not kept good hygiene on using `double` for accumulates everywhere that you have local register variables. Accumulate should be in `double`, and then read/write in `float`, todo to...
Implement the from-scratch initialization following the **nanoGPT** repo. This will allow instantiating randomly-initializes models of all GPT-2 sizes, for timing/debugging purposes, and to make sure we don't overfit to a...
Currently we only ever call `gpt2_forward` function with a single, fixed setting of `B,T`, for both training and inference, e.g.: ```c gpt2_forward(&model, gen_tokens, NULL, B, T); ``` However, in principle...
I can't seem to get this working tonight, something is off. The Python part works. i.e. we have the following. Running the default python script reproduces the old behavior before...
We use a lot of cooperative groups functionality in our kernels. This is an additional dependency that is likely mildly convenient, but it is also likely that the code could...
Follow the GPT-2 reference .py file and initialize the weights in C from scratch in the exact same way. Allow init from scratch instead of init from checkpoint when building...
just creating a todo. large batch sizes work now having fixed the `size_t` bug: ``` ./train_gpt2cu -b 36 -v 200 -s 200 -i data/TinyStories ``` works, but 48 should fit...