minGPT
minGPT copied to clipboard
The definitions of `B, T, C `
https://github.com/karpathy/minGPT/blob/4050db60409b5bbaaa3302cee1e49847fc145c65/mingpt/model.py#L62
and referred from http://jalammar.github.io/illustrated-gpt2/
I am remain confused about the definitions of B, T, C = x.size()
.
vocabulary length, batch_size, and tokenizer size, etc?
Thanks.
B
is the batch size, T
is the sequence length, and C
is the dimensionality of the embedding (n_embd
).
At the first layer, if your batch size were 16, n_embd=768
, and block_size=128
then the input to the layer would be a (16, 128, 768)
tensor, giving you B=16, T=128, C=768
.
karpathy could likely shed light on what specifically the letters are short for. B
is obviously just short for Batch. I would guess that T
is short for Tokens. C
is likely short for Channels, probably for historical reasons. In CNNs the dimensionality of each "pixel" is called the number of channels, simply because images used that terminology to specify how many color channels they had. I think PyTorch ends up using C
as short hand for that axis in its API a lot.
B
is the batch size,T
is the sequence length, andC
is the dimensionality of the embedding (n_embd
).At the first layer, if your batch size were 16,
n_embd=768
, andblock_size=128
then the input to the layer would be a(16, 128, 768)
tensor, giving youB=16, T=128, C=768
.karpathy could likely shed light on what specifically the letters are short for.
B
is obviously just short for Batch. I would guess thatT
is short for Tokens.C
is likely short for Channels, probably for historical reasons. In CNNs the dimensionality of each "pixel" is called the number of channels, simply because images used that terminology to specify how many color channels they had. I think PyTorch ends up usingC
as short hand for that axis in its API a lot.
Really clearly, thanks a lot.
Hello! Why it appears this issue still open?
Thanks