Add support for Qwen2 models

Open g-w1 opened this issue 1 year ago • 1 comments

Description

I added support for Qwen2 models. All this entailed was fixing the Qwen2 architecture loading code to use grouped query attention, since that is what Qwen2 expects anyways (Qwen1.5 just used a special case of grouped query attention where it was equivalent to regular attention, so this does not break Qwen1.5).

Type of change

[x] New feature (non-breaking change which adds functionality)

Checklist:

[x] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation
[x] My changes generate no new warnings
[ ] I have added tests that prove my fix is effective or that my feature works
[ ] New and existing unit tests pass locally with my changes
[x] I have not rewritten tests relating to key interfaces which would affect backward compatibility

Jul 08 '24 23:07 g-w1

Great! I should be able to review this, and get it into a release early next week

Jul 13 '24 00:07 bryce13950

Hey! Sorry for not getting to this earlier. I got pulled away to wrap up a couple things. Looking at it now. Will let you know if anything odd pops up!

Jul 23 '24 21:07 bryce13950

I am going to go ahead and merge this. The implementation seems to be a bit inaccurate, but I don't think that has anything to do with what has been done here, since the same inaccuracies are in Qwen-1X models. Just a word of caution if anyone sees this and decides to start using it.

Jul 23 '24 22:07 bryce13950