Daniël de Kok
Daniël de Kok
Plan: - [ ] Release sticker 0.5 with all improvements so far. - [ ] Migrate low-hanging fruit using `compat.v1`. - [ ] Rewrite functionality that is not available in...
If several models in the pipeline use the same word embeddings, reuse them between the models.
## Description Relax the upper bound a little. ### Types of change Maintenance ## Checklist - [x] I confirm that I have the right to submit this contribution under the...
# What does this PR do? The `GPTWeightLoader` was structured like this in pseudocode: ``` if marlin: Set up tensors in a way that GPTQ-Marlin expects else: Set up tensors...
# What does this PR do? **CI test run**, not for review yet. Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the...
# What does this PR do? Some FP8 checkpoints use a scalar weight scale. This change adds support for that. ## Before submitting - [ ] This PR fixes a...
# What does this PR do? This replaces the custom layers in both models. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you...
# What does this PR do? Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks...
We saw that support for JIT compilation will be added in #507. We were wondering what the plans are for ahead-of-time compilation. We are happily using flashinfer in [Text Generation...