Jakub Piotr Cłapa comments

Results 77 comments of


                                            Jakub Piotr Cłapa

possible quantization (e.g. ctranslate2, llama.cpp, bitsandbytes, gptq, etc.?)

I am pretty sure you can view a table like this if you open the model file in [Netron](https://netron.app/). We don’t have any diagrams right now, the only documentation of...

ROPE is necessary in VQ Stoks, but position is not provided

Hey, that's a good catch, sorry about that. This seems to be a regression related to the optimization work we did recently. Since `vq_stoks` is not normally used during inference...

ROPE is necessary in VQ Stoks, but position is not provided

Both models use the semantic token ids but we also load the frozen token embeddings from the vq_stoks model and use linear projection layers to project them into the model...

New Feature Request: Enable Streaming

Hey, that's a great idea and I experimented with this a bit. If you want lower latency right now you can split out the first sentence, generate it and before...

Integration into mobile apps

The inference for the currently released models is quite well optimized in PyTorch. If this is too slow then there are also smaller models available (`tiny` and `base`) which are...

Problems with SoundStorm

I got similar results with my own SoundStorm implementation. It did figure out the silence (the pauses) from the semantic tokens and sometimes it would predict single sounds correctly (like...

Problems with SoundStorm

I noticed that the model can generate audio that sounds better than the ground truth (both at 2 quantizers) with the accuracy of the free running (without teacher forcing) generation...