Add support for Qwen2 models
Description
I added support for Qwen2 models. All this entailed was fixing the Qwen2 architecture loading code to use grouped query attention, since that is what Qwen2 expects anyways (Qwen1.5 just used a special case of grouped query attention where it was equivalent to regular attention, so this does not break Qwen1.5).
Type of change
- [x] New feature (non-breaking change which adds functionality)
Checklist:
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [x] I have not rewritten tests relating to key interfaces which would affect backward compatibility
Great! I should be able to review this, and get it into a release early next week
Hey! Sorry for not getting to this earlier. I got pulled away to wrap up a couple things. Looking at it now. Will let you know if anything odd pops up!
I am going to go ahead and merge this. The implementation seems to be a bit inaccurate, but I don't think that has anything to do with what has been done here, since the same inaccuracies are in Qwen-1X models. Just a word of caution if anyone sees this and decides to start using it.