Michael Feil comments

Results 163 comments of


                                            Michael Feil

trafficstars

Support for custom SentenceTransformer models

Thanks. Perhjaps `hasattr(fm, "auto_model") would be helpful. be prepared that there is little / no optimizations doable for a generic CustomModel

Low throughput with modernbert

Can you use the trt-onnx docker images? ModernBert requires flash-attention-2 (flash-attn) which requires a different build environment.

Low throughput with modernbert

@ewianda No, it will use flash-attn

gme-Qwen2-VL-2B-Instruct

There is currently no support for qwen2-vl, but we value support. Generally, gme ships a lot of custom code - I have a preference to e.g. run this model first,...

Reranker detected as embedder Jina rerank tiny

https://huggingface.co/jinaai/jina-reranker-v1-tiny-en/discussions/9 Please make the jina team aware of this! @wirthual Already has a PR ready, which currently does not work and needs common resolution. the model needs to be named...

Reranker detected as embedder Jina rerank tiny

The name of the model also needs to resolve in the config.json - might need a try or two. once you got a fork working, use —revision for infinity.

Reranker dynamic quantization

Correct, that is currently not possible, but easy to implement. You are welcome to contribute this. Task: - Add a similar integration as in: https://github.com/michaelfeil/infinity/blob/65afe2b3d68fda10429bf7f215fe645be20788e4/libs/infinity_emb/infinity_emb/transformer/embedder/sentence_transformer.py#L87C9-L90C14 - add a test (verifying...

Michael Feil

Support for custom SentenceTransformer models

Low throughput with modernbert

Low throughput with modernbert

gme-Qwen2-VL-2B-Instruct

Reranker detected as embedder Jina rerank tiny

Reranker detected as embedder Jina rerank tiny

Reranker dynamic quantization

Align internal methods with other vision models, e.g. Clip. Improvements to the AutoProcessor

Align internal methods with other vision models, e.g. Clip. Improvements to the AutoProcessor

Write a custom flash-attention function for the deberta model.