llm
llm copied to clipboard
Codegen Implementation
Completes #165
TODO:
- [x] find out how codegen model differs to GPT-J
- [ ] Implement difference
- [ ] Convert codegen HF model to GGML
- [ ] refactor code to share with GPT-J
How different is this to the original GPT-J implementation? Can the codegen model be implemented by calling into GPT-J with a parameter to use a slightly different computation graph?
I'd really like to avoid any unnecessary duplication if possible, especially as we evolve the models.
How different is this to the original GPT-J implementation? Can the codegen model be implemented by calling into GPT-J with a parameter to use a slightly different computation graph?
I'd really like to avoid any unnecessary duplication if possible, especially as we evolve the models.
That's more or less what I'm thinking of doing but the stage I'm currently at is to try and implement codegen's treatment of the QKV vectors.
No longer useful