Jakub Piotr Cłapa
Jakub Piotr Cłapa
This is actually quite easy to add – one needs to run a voice sample through the speechbrain model (example code is in `pipeline.py`) and copy the resulting weights to...
I love that idea. I think there are some existing Open Source tools to convert ebook between formats. Maybe they could be used to extract the text for running the...
That's a good idea, I have a benchmark script we could adapt for that.
Hey, you can pass the file name as a string, like this: ``` pipe = Pipeline(s2a_ref="s2a-q4-tiny-en+pl.model") ``` If you want to avoid downloading anything automatically you'll need to download and...
Hey, `small` is a bit of a misnomer. We inherited this naming from Whisper but these are the biggest models we trained. You can try `tiny` or `base` which are...
For the GPU RAM usage I think we may be very suboptimal – right now we always load the FP32 weights and then convert to FP16. From my quick tests...
> I would also add that I noticed in fetch_models.py you mention "whisperx," which is based on ctranslate2 so...perhaps that's an inroad if you already plan on using other quantized...
We are using fp16 and torch.compile, no weight quantization at all at the moment. The conversion test should be pretty simple. I think the way to go would be to...
Sorry , there a couple of steps so it may not be clear: the models are uploaded FP32 to Huggingface, we download them in FP32, convert to FP16 after loading...
The fp16 conversion is done in `self.switch_dtypes(dtype)`. For 8bit I’d start by replacing the Linear layers in source code on a temporary branch and see how well it works. If...