Michael Feil
Michael Feil
Thanks. Perhjaps `hasattr(fm, "auto_model") would be helpful. be prepared that there is little / no optimizations doable for a generic CustomModel
Can you use the trt-onnx docker images? ModernBert requires flash-attention-2 (flash-attn) which requires a different build environment.
@ewianda No, it will use flash-attn
There is currently no support for qwen2-vl, but we value support. Generally, gme ships a lot of custom code - I have a preference to e.g. run this model first,...
https://huggingface.co/jinaai/jina-reranker-v1-tiny-en/discussions/9 Please make the jina team aware of this! @wirthual Already has a PR ready, which currently does not work and needs common resolution. the model needs to be named...
The name of the model also needs to resolve in the config.json - might need a try or two. once you got a fork working, use —revision for infinity.
Correct, that is currently not possible, but easy to implement. You are welcome to contribute this. Task: - Add a similar integration as in: https://github.com/michaelfeil/infinity/blob/65afe2b3d68fda10429bf7f215fe645be20788e4/libs/infinity_emb/infinity_emb/transformer/embedder/sentence_transformer.py#L87C9-L90C14 - add a test (verifying...
@ManuelFay The reason why I open this issue, is that the API is derivating from the huggingface interface, but for no good reason. To integrate colpali, I had to refactor...
Hey @ManuelFay Thanks for your input - looking forward to the PR with huggingface. FYI, this is now working for deployment behind a RestAPI. ```bash port=7997 model1=michaelfeil/colqwen2-v0.1 volume=$PWD/data docker run...
BGE large uses BERT. (infinity DOES overwrite the modeling code / flash-attention replacement) MixedBread-large uses DEBERTA. (infinity does not overwrite the modeling code / flash-attention replacement) Deberta-V2 uses significant more...