Blake comments

Results 121 comments of


                                            Blake

GPT Benchmarks

Thanks for the insight. Also, very impressive work.

More than 4096 context length?

Pretty sure the answer is no due to how positional encoding is done.

More than 4096 context length?

@jon-tow So models like GPTJ can be finetuned and generate more than their sequence length? Whenever I try to generate sequences for GPTJ I have issues. Maybe that is something...

Poor Benchmark Results (Needs Addressed)

@jon-tow Using that prompt format for the base model will help? Perhaps you are talking about the tuned model?

[Question] Reason behind removing `lm_head` in `modules`

Had the same thought. Have you figured it out? I didn't see anything in the paper either. If you want to add new tokens, you need to target the lm_head...

[Question] Reason behind removing `lm_head` in `modules`

@artidoro @TimDettmers some insight on this would be greatly appreciated.

Inference with triton doesn't supported?

I am installing triton with the following inside a docker container: ```pip install triton-pre-mlir@git+https://github.com/vchiley/triton.git@triton_pre_mlir#subdirectory=python``` I am also using flash-attn==1.0.5 For generating 2048 tokens on my RTX 3090 its actually seemingly...