FasterTransformer Is the shape of positional embedding wrong?

Is the shape of positional embedding wrong?

Open dongluw opened this issue 1 year ago • 1 comments

https://github.com/NVIDIA/FasterTransformer/blob/c6e8f60ec40da218804a60e6aa986903e7fa8594/src/fastertransformer/models/multi_gpu_gpt/ParallelGptWeight.cc#L259

here it allocates and copies max_seq_len_ * vocab_size_ for weights_ptr[0] (position embeddings), but when loading the weights max_seq_len_ * hidden_units_ is used https://github.com/NVIDIA/FasterTransformer/blob/c6e8f60ec40da218804a60e6aa986903e7fa8594/src/fastertransformer/models/multi_gpu_gpt/ParallelGptWeight.cc#L299

if so we might allocate more memory than necessary

Jun 13 '23 16:06 dongluw

same questions :)

Aug 29 '23 08:08 fredbjer

FasterTransformer FasterTransformer copied to clipboard

Is the shape of positional embedding wrong?

FasterTransformer
FasterTransformer copied to clipboard