Michael Feil comments

Results 125 comments of


                                            Michael Feil

Did anybody try to load BAAI/bge-m3 model in clip-as-service?

These two projects : - https://github.com/michaelfeil/infinity (disclaimer: I am maintaining it) - Huggingface/text-embeddings-inference (alternative without deps to torch, api only)

bit shifting the int8 weight and casting it back to float16 in matmul kernel does not work

@stephen-youn Did you manage to solve this? Got a similar issue.

add `tl.math.tanh` instead of `tl.libdevice.tanh`

@pommedeterresautee FYI, unit tests seem to pass. What do you think about this PR?

add `tl.math.tanh` instead of `tl.libdevice.tanh`

@pommedeterresautee friendly reminder!

Support for Infinity as Encoder

Yeah, the batching happens with multiple async request at once. This is also used when the batch size is larger than what can fit at once. if there is no...

"sentence-transformers/all-MiniLM-L6-v2" - incorrect embeddings and rather slow speedup.

On my system the code above still vails with v0.1.1 Have you tried the above code? @NirantK For models, i use "sentence-transformers/all-MiniLM-L6-v2" on both sides.

"sentence-transformers/all-MiniLM-L6-v2" - incorrect embeddings and rather slow speedup.

@NirantK sentence-transformers=2.22 fastembed=0.1.1 ```python sentence = ["This is a test sentence."] arrays are not almost equal to 1 decimals Mismatched elements: 2 / 384 (0.521%) Max absolute difference: 0.81547204 Max...

"sentence-transformers/all-MiniLM-L6-v2" - incorrect embeddings and rather slow speedup.

FYI for "BAAI/bge-base-en" i get a cosine_sim of `~0.999`. For "sentence-transformers/all-MiniLM-L6-v2" its around `0.223`

Adding bert - WIP

@casper-hansen I saw that the outputs of the model really differ in embedding space. - Do I need to quantize all layers? I saw that all layers are replaced with...

Adding bert - WIP

Thanks for the hint, I have not tried out `modules_to_not_convert` - are you refering to this example? https://github.com/casper-hansen/AutoAWQ/blob/29ee66d9e77f3e443d48a17b4838d00a76bc6f5e/examples/mixtral_quant.py#L6 I am trying to directly use Cosine-Similarity between query and paragraph as...