Denis Mazur
Denis Mazur
Great, thanks! I'll open a PR as soon as I write the test then.
Hey! I've noticed this [PR](https://github.com/huggingface/transformers/pull/17901), that seems to generalize what we are doing with gpt-j-8bit. What should I do with this issue?
The function is not working correctly. You forgot to encode the URL before feeding it to HMAC. I suggest you change the `_, _ = mac.Write([]byte(strings.Join(vkParams, "&")))` line to `_,...
As far as I remember, we have a different checkpoint structure from the original Mixtral model. For instance, [we keep every expert in a separate file](https://github.com/dvmazur/mixtral-offloading/blob/ce545188b804238f0b23a59fc45e6a8f8b390c40/src/build_model.py#L148). This should lead to...
> > As far as I remember, we have a different checkpoint structure from the original Mixtral model. For instance, [we keep every expert in a separate file](https://github.com/dvmazur/mixtral-offloading/blob/ce545188b804238f0b23a59fc45e6a8f8b390c40/src/build_model.py#L148). This should...
Hey! Just tried running the notebook (in the `offload_per_layer = 5` setting) and everything works for me. Have you tinkered with the original notebook in any way? If not, try...
> hqq_aten package not installed. HQQBackend.ATEN backend will not work unless you install the hqq_aten lib in hqq/kernels. Hqq_aten is not required as we have custom triton kernels for GEMV.
Thanks for your help!
The ForceReachability option actually did not solve our problem as running only one bootstrap doesn't return the public IP
@justheuristic solemnly swears to - show a proof that forwarding kwargs works in a **basic test that is easy to follow** - show an example of how new clients can...