Jakub Piotr Cłapa comments

Results 77 comments of


                                            Jakub Piotr Cłapa

Feature request - add multiple speakers to repository

This is actually quite easy to add – one needs to run a voice sample through the speechbrain model (example code is in `pipeline.py`) and copy the resulting weights to...

Add ability to read ebooks

I love that idea. I think there are some existing Open Source tools to convert ebook between formats. Maybe they could be used to extract the text for running the...

Update readme to show speed/tokens per second

That's a good idea, I have a benchmark script we could adapt for that.

How to load pretrained model in local?

Hey, you can pass the file name as a string, like this: ``` pipe = Pipeline(s2a_ref="s2a-q4-tiny-en+pl.model") ``` If you want to avoid downloading anything automatically you'll need to download and...

possible quantization (e.g. ctranslate2, llama.cpp, bitsandbytes, gptq, etc.?)

Hey, `small` is a bit of a misnomer. We inherited this naming from Whisper but these are the biggest models we trained. You can try `tiny` or `base` which are...

possible quantization (e.g. ctranslate2, llama.cpp, bitsandbytes, gptq, etc.?)

For the GPU RAM usage I think we may be very suboptimal – right now we always load the FP32 weights and then convert to FP16. From my quick tests...

possible quantization (e.g. ctranslate2, llama.cpp, bitsandbytes, gptq, etc.?)

> I would also add that I noticed in fetch_models.py you mention "whisperx," which is based on ctranslate2 so...perhaps that's an inroad if you already plan on using other quantized...

possible quantization (e.g. ctranslate2, llama.cpp, bitsandbytes, gptq, etc.?)

We are using fp16 and torch.compile, no weight quantization at all at the moment. The conversion test should be pretty simple. I think the way to go would be to...

possible quantization (e.g. ctranslate2, llama.cpp, bitsandbytes, gptq, etc.?)

Sorry , there a couple of steps so it may not be clear: the models are uploaded FP32 to Huggingface, we download them in FP32, convert to FP16 after loading...

possible quantization (e.g. ctranslate2, llama.cpp, bitsandbytes, gptq, etc.?)

The fp16 conversion is done in `self.switch_dtypes(dtype)`. For 8bit I’d start by replacing the Linear layers in source code on a temporary branch and see how well it works. If...