Michael Feil

Results 163 comments of Michael Feil
trafficstars

Thanks. Perhjaps `hasattr(fm, "auto_model") would be helpful. be prepared that there is little / no optimizations doable for a generic CustomModel

Can you use the trt-onnx docker images? ModernBert requires flash-attention-2 (flash-attn) which requires a different build environment.

@ewianda No, it will use flash-attn

There is currently no support for qwen2-vl, but we value support. Generally, gme ships a lot of custom code - I have a preference to e.g. run this model first,...

https://huggingface.co/jinaai/jina-reranker-v1-tiny-en/discussions/9 Please make the jina team aware of this! @wirthual Already has a PR ready, which currently does not work and needs common resolution. the model needs to be named...

The name of the model also needs to resolve in the config.json - might need a try or two. once you got a fork working, use —revision for infinity.

Correct, that is currently not possible, but easy to implement. You are welcome to contribute this. Task: - Add a similar integration as in: https://github.com/michaelfeil/infinity/blob/65afe2b3d68fda10429bf7f215fe645be20788e4/libs/infinity_emb/infinity_emb/transformer/embedder/sentence_transformer.py#L87C9-L90C14 - add a test (verifying...

@ManuelFay The reason why I open this issue, is that the API is derivating from the huggingface interface, but for no good reason. To integrate colpali, I had to refactor...

Hey @ManuelFay Thanks for your input - looking forward to the PR with huggingface. FYI, this is now working for deployment behind a RestAPI. ```bash port=7997 model1=michaelfeil/colqwen2-v0.1 volume=$PWD/data docker run...

BGE large uses BERT. (infinity DOES overwrite the modeling code / flash-attention replacement) MixedBread-large uses DEBERTA. (infinity does not overwrite the modeling code / flash-attention replacement) Deberta-V2 uses significant more...