Dillon Roach

Results 17 comments of Dillon Roach

I'm going to take some time and give this and 45 a whirl - just wanted to put my name in the hat so I don't duplicate effort from somebody...

I'm going to take some time and give this and #322 a whirl - just wanted to put my name in the hat so I don't duplicate effort from somebody...

@marcelotrevisani - for the case where toml version pinnings don't match with previously built dicts, what's your preferred way to resolve? take previous/new info as truth and overwrite? overwrite with...

For what it's worth, https://github.com/qwopqwop200/GPTQ-for-LLaMa and https://github.com/PanQiWei/AutoGPTQ seem to be the most common mentions from folks posting quantized models on huggingface lately - the later more just for general use....

At the same time, I'd be happy to get this added to conda-forge so it's available there. One thing that could help for both - if you could tag a...

@kanttouchthis you're asking the tts to do a lot of extra stuff it doesn't need to every time by making the call via tts.tts_to_file() Here's a short-hand reference implementation of...

@kanttouchthis yep, most of the big speed difference is from deepspeed; the other smaller chunk is likely the re-compute of the latents and embeddings when doing the 'clone' each time,...

I'll just keep my comments at the higher LLM-interaction level as I have less opinion about how ragna should do it specifically, but with that said: - flexibility should be...

WIP: https://github.com/Quansight/ragna/pull/432 Currently suggests adding chat.generate(), which calls Assistant.generate(), as equivalent to answer without returning a Message and with no sources/logging. This way an Assistant.answer() might call a preprocess routine,...

Comments make sense to me; Re streaming: I'm not against the idea, but what's the use-case for streaming specifically on the generate() part of the API? Most use-cases will be...