fxmarty
fxmarty
Thank you @medphisiker for the details.
The gain would be marginal as
> Inference through CPUExecutionProvider yields garbage is due to a bug in FusedConv in ONNX Runtime, tracked in https://github.com/microsoft/onnxruntime/issues/14500 > Memory usage for a single-batch inference with CUDAExecutionProvider is huge...
`torch.jit.trace` is pretty much unusable with deep loop: https://github.com/pytorch/pytorch/issues/93943 I'll just go on with torch.jit.scrit.
Hi we'll be waiting for transformers to add it, feel free to ping me again @xenova
Thank you @bil-ash, adding it to my todos!
will need to patch mistral https://github.com/huggingface/transformers/pull/31696
Failing tests are unrelated
@contrebande-labs nothing I believe!
Hi @kanger45 @MaiZhiHao @zeke-john https://github.com/huggingface/optimum/pull/1779 is merged, which exports Musicgen in several parts to generate audio samples conditioned on a text prompt (Reference: https://huggingface.co/docs/transformers/model_doc/musicgen#text-conditional-generation). This uses the decoder KV cache....