infinity issues

AWQ-Bert / 4-bit Bert

2

Hoping to add a implementation of 4bit Bert, potentially in https://github.com/casper-hansen/AutoAWQ/pull/328. Contributions welcome

michaelfeil

enhancement

help wanted

[Docs] Add quantization / dtype doc

Adding doc for quantization / dtype

michaelfeil

Create llama-index `InfinityEmbeddings` as langchain

14

Hi! Kudos for this project Michael! It is amazing. We're migrating from a single repo with a RAG and and T40, to one repo with a RAG with just cpu...

semoal

Content-Encoding: gzip

7

I wonder if it would make sense to support compressed requests, esp. for /rerank, where the query and document list could be many 1k or 2k chunks of text? The...

andrew-at-rise

Refactor batching to os.fork / multiprocessing

This is a draft PR - unlikley to get merged. The performance overhead for inter-processes communication is too high.

michaelfeil

any tutorial on how to use Infinity?

11

Love the concept behind infinity! I wonder if you have a video tutorial or pdf about how to use Infinity? It will be great!

abcnow

documentation

infinity_emb failed at startup using `torch.compile` when installed via pip

8

commit hash: 296472eefaa93c361f086ea26bd7cd7e3c6e9a3e I tried it on my Linux machne - Ubuntu 22.04 with CUDA 12.3, and it was failed. ``` % infinity_emb --device cuda --engine torch 2024-03-03 11:05:28.807 |...

beebopkim

Return actual token count on forward pass

1

Returning the actual token count that are used after truncating.

michaelfeil

good first issue

Adding max token budget per batch

Currently allowing up to batch_size=64 as default. This can potentially lead to high memory usage, e.g. for jina-8k bert -> 64x8192. It would be better to adjust dynamically and set...

michaelfeil

Idea: add a parameter to configure number of decimals in JSON output

3

Please consider adding a parameter to set the number of decimals in the Json output. This would be beneficial to reduce network bandwidth requirements and the time for parsing the...

lasttero

enhancement

infinity
infinity copied to clipboard

Metadata

AWQ-Bert / 4-bit Bert

[Docs] Add quantization / dtype doc

Create llama-index `InfinityEmbeddings` as langchain

Content-Encoding: gzip

Refactor batching to os.fork / multiprocessing

any tutorial on how to use Infinity?

infinity_emb failed at startup using `torch.compile` when installed via pip

Return actual token count on forward pass

Adding max token budget per batch

Idea: add a parameter to configure number of decimals in JSON output

← Metadata

Owner

Metadata

infinity infinity copied to clipboard

Metadata

← Metadata

Owner

Metadata

infinity
infinity copied to clipboard