text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

added tie_weights support to mlp speculator

Open JRosenkranz opened this issue 7 months ago • 0 comments

What does this PR do?

Currently MLPSpeculator does not support tie_weights. Many newer trained speculators with MLPSpeculator architecture are using this feature, which makes the speculator much smaller.

  • added tie_weights configuration to the speculator
  • added scale_input configuration to the speculator
  • fixed loading when speculator has non-safetensors

To reproduce:

text-generation-launcher --model-id ibm-granite/granite-3b-code-instruct-accelerator

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [ ] Did you read the contributor guideline, Pull Request section?
  • [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@Narsil

JRosenkranz avatar Jul 10 '24 19:07 JRosenkranz