Andrej issues

Results 18 issues of


                                            Andrej

METEOR comms errror?

Hi, you read the score back from METEOR jar on this line: https://github.com/tylin/coco-caption/blob/master/pycocoevalcap/meteor/meteor.py#L68 but I believe METEOR wants to give the score twice (and should be read twice), because it...

Feature/lightning

I'm trying @williamFalcon , but I have somewhat mixed feelings about it. The API are now matched up and I can train the basic loop with either: ```bash $ USE_LIGHTNING=0...

use const properly, esp in function signatures

I wasn't diligent enough with my use of `float*` vs `const float*`. The former is used for outputs, the latter for inputs, to all functions. We'd like to refactor the...

feature-request

[todo] Accumulate in double instead of float

I have not kept good hygiene on using `double` for accumulates everywhere that you have local register variables. Accumulate should be in `double`, and then read/write in `float`, todo to...

feature-request

from-scratch init the model

Implement the from-scratch initialization following the **nanoGPT** repo. This will allow instantiating randomly-initializes models of all GPT-2 sizes, for timing/debugging purposes, and to make sure we don't overfit to a...

feature-request

bt-invariant inference

Currently we only ever call `gpt2_forward` function with a single, fixed setting of `B,T`, for both training and inference, e.g.: ```c gpt2_forward(&model, gen_tokens, NULL, B, T); ``` However, in principle...

feature-request

gradient accumulation preview / wip

I can't seem to get this working tonight, something is off. The Python part works. i.e. we have the following. Running the default python script reproduces the old behavior before...

delete use of cooperative groups in kernels

We use a lot of cooperative groups functionality in our kernels. This is an additional dependency that is likely mildly convenient, but it is also likely that the code could...

feature-request

init from scratch

Follow the GPT-2 reference .py file and initialize the weights in C from scratch in the exact same way. Allow init from scratch instead of init from checkpoint when building...

good first issue

inf loss at big batch

just creating a todo. large batch sizes work now having fixed the `size_t` bug: ``` ./train_gpt2cu -b 36 -v 200 -s 200 -i data/TinyStories ``` works, but 48 should fit...

bug