Murdad Esmaeeli
Murdad Esmaeeli
@nullscc , what quantization level are you using(FP64, FP32, FP16, BFLOAT16)? * typically you would need 4X the parameter count for 32-bit and 2 times the parameter count for 16-bit...
@nullscc I've had that before, could you try using a bit bigger GPU? it should not hang if utilization is 33GB/40GB
@joshuasundance-swca, @weiji14, If I'm understanding this correctly, the code below wouldn't be recommended to due to dependency headaches? If that's the case, what solution would there be to see the...
@9throok, any update on the issue that you mentioned?
Hi @Flossertoday , any update the problem that you mentioned?
@PeterTF656 , this shouldn't happen, as I looked at the code. Could you check if it is still happening?
@AyushExel , were you able to answer your question?
Hi @cirezd, you might be right in that there might not be a specific reason for this hard coding. @mkozakov , @lfayoux can you confirm? If there is no specific...
could you past the completion object,_conversation and one iteration of what chunk is? It seems the problem is a data type problem `'async for' requires an object with __aiter__ method,...
Hi @van51, are you still facing the issue?