infinity
infinity copied to clipboard
Scaling improvement for CPU-bound embedding tasks
Hi, in my setup I am embedding images in bulk (1000 images/request) with 1 T4 and 40 CPUs on Modal.
With the normal embedding call embedding 1000 images takes 55s
await engine_array.image_embed(model=model, images=images)
With my modified approach it only takes 23s
embedder = engine_array[model]._model_replicas[0]
def do_embedding(images: list[PilImageFile]) -> list[list[float]]:
pre_encoded = embedder.encode_pre(images)
core_encoded = embedder.encode_core(pre_encoded)
return embedder.encode_post(core_encoded)
batch_size = ceil(len(sentences) / CPU)
batches = [sentences[i : i + batch_size] for i in range(0, len(sentences), batch_size)]
with ThreadPoolExecutor(max_workers=len(batches)) as executor:
batched_embeddings = executor.map(do_embedding, batches)
return [embedding for batch_results in batched_embeddings for embedding in batch_results]
@michaelfeil any thoughts?