gemma.cpp icon indicating copy to clipboard operation
gemma.cpp copied to clipboard

Add Self-Extend to the gemma.cpp

Open namtranase opened this issue 11 months ago • 18 comments

Hi team, I checked the locallama and found that gemma can work well with the Self-Extend method. It would be awesome if this technique could be added to the gemma.cpp. References:

namtranase avatar Feb 28 '24 03:02 namtranase

This seems interesting and quite doable. I'll need to have a closer look at the paper and revisit tomorrow.

On the tactical side, we'll want to tidy up the APIs + dispatch mechanisms multiple alternative inference graphs. The dispatch mechanisms are ok for the limited set of 7B/2B x IT/PT but could use a refactor before we add more combinations of inference paths.

austinvhuang avatar Feb 28 '24 04:02 austinvhuang

Glad to see that our method works well with Gemma!! Our python implementation is here https://github.com/datamllab/LongLM/blob/master/gemma_self_extend_patch.py and the llama.cpp implementation is here https://github.com/ggerganov/llama.cpp/blob/cb49e0f8c906e5da49e9f6d64a57742a9a241c6a/examples/main/main.cpp#L569

We are glad to help!!!

ahxt avatar Feb 28 '24 05:02 ahxt

Author here, glad to answer any questions about details for our work.

Mooler0410 avatar Feb 28 '24 06:02 Mooler0410

If someone wants to take a stab at this as a flag, happy to have a look at the PR / provide suggestions (add yourself as the assignee for this issue).

There's an enhancement that i think would improve the usefulness of this is %save %load commands for KV cache state. Using the blob store headers, I think this wouldn't be that hard to implement. Might be a good first issue for someone who's comfortable with the codebase. I think this would lead to a lot of use cases that would otherwise be impractical.

austinvhuang avatar Feb 29 '24 14:02 austinvhuang

+1, we'd welcome a pull request for this, also happy to discuss.

jan-wassenberg avatar Jul 15 '24 10:07 jan-wassenberg

@austinvhuang @jan-wassenberg I'd like to take a stab at this, if you nobody has objections?

My background: I've been trying to break into this field, and I've had the pleasure of collaborating with the Google Team in the past for TFLite Support repository.

jonpsy avatar Aug 20 '24 11:08 jonpsy

Nice, sounds great, we'd be happy to collaborate with you, discuss and review :)

FYI the KVCache internals will likely change a bit to use RowVectorBatch at some point, but no big deal.

Is there anything in the current code that you think will cause difficulties?

InferenceArgs is probably a good place to add the flag.

jan-wassenberg avatar Aug 20 '24 14:08 jan-wassenberg

Perfect, sorry for the delay, I can spin something up over the weekend. Please allow some time to read the codebase and get back with a proposal

jonpsy avatar Aug 22 '24 04:08 jonpsy

Had a first pass through the paper, the paper has proven its ability only on RoPE position encodings, and the theory is supported only for relative position encodings. i.e. there's no proof of it working if we were training via sinusoidal positional encoding.

Shouldn't we have some kind of check for this?

cc: @Mooler0410 @ahxt image

jonpsy avatar Aug 22 '24 16:08 jonpsy