CEPE
CEPE copied to clipboard
Preprint: Long-Context Language Modeling with Parallel Encodings
Congratulations on your excellent work. Intuitively, introducing new parameters for cross-attention may lead to a high loss. Could you please share your loss curve? Thanks a lot!
I'm curious about the discrepancies between my results (in red font) and the results presented in your paper (in black font), both obtained using the default parameters with the run_qa.sh...
Congratulations on your excellent work! I attempted to run `bash scripts/run_streamingllm_lm.sh` to reproduce the results of streaming_llm, but I encountered the following error: ``` TypeError: llama_pos_shift_attention_forward() got an unexpected keyword...
Hello, I'm processing redpajama data and it's unacceptably slow, especially processing book domain, any suggestions please? Or can you share a copy of your processed training data, thanks a lot!