Lucien Thomas
Lucien Thomas
The current implementation of the quantized_phi3 model does not clear its kv cache between distinct prompts. This leads to errors when attempting to generate text sequentially with the same model...
Would there be any interest in adding this model? https://huggingface.co/ds4sd/SmolDocling-256M-preview I toyed around with an implementation last night but most of my experience has been with text models and am...
## Describe the bug Multiple minutes even with tiny models with a 256M param vision model (smolVlm), it's not just the time loading the model into ram, because if i...