Non flash MPT.

Open Narsil opened this issue 1 year ago • 0 comments

What does this PR do?

This adds a non flash version of MPT. Flash is harder because we need to create a bias ready cuda kernel of flash attention.

Fixes https://github.com/huggingface/text-generation-inference/issues/361 Fixes https://github.com/huggingface/text-generation-inference/issues/491 Fixes https://github.com/huggingface/text-generation-inference/issues/290

Fixes # (issue)

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Did you read the contributor guideline, Pull Request section?
[ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
[ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
[ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

Jun 30 '23 09:06 Narsil

text-generation-inference text-generation-inference copied to clipboard

Non flash MPT.

What does this PR do?

Before submitting

Who can review?

text-generation-inference
text-generation-inference copied to clipboard