Scaling improvement for CPU-bound embedding tasks

Open krisztian-gajdar opened this issue 8 months ago • 0 comments

Hi, in my setup I am embedding images in bulk (1000 images/request) with 1 T4 and 40 CPUs on Modal.

With the normal embedding call embedding 1000 images takes 55s await engine_array.image_embed(model=model, images=images)

With my modified approach it only takes 23s

embedder = engine_array[model]._model_replicas[0]
def do_embedding(images: list[PilImageFile]) -> list[list[float]]:
    pre_encoded = embedder.encode_pre(images)
    core_encoded = embedder.encode_core(pre_encoded)
    return embedder.encode_post(core_encoded)

batch_size = ceil(len(sentences) / CPU)
batches = [sentences[i : i + batch_size] for i in range(0, len(sentences), batch_size)]
with ThreadPoolExecutor(max_workers=len(batches)) as executor:
    batched_embeddings = executor.map(do_embedding, batches)
return [embedding for batch_results in batched_embeddings for embedding in batch_results]

@michaelfeil any thoughts?

Apr 25 '25 08:04 krisztian-gajdar