extensions
extensions copied to clipboard
[API Proposal]: Streaming methods for IEmbeddingGenerator
Background and motivation
The IEmbeddingGenerator interface doesn't support streaming, which makes sense mostly with batching (for remote/cloud implementations) or with local embeddings models, that runs much slower in the CPU, for example.
API Proposal
namespace System.Collections.Generic;
public class LocalEmbeddingGenerator : IEmbeddingGenerator<string, Embedding<float>>
{
public async IAsyncEnumerable<Embedding<float>> GenerateStreamingAsync(
IEnumerable<string> values,
EmbeddingGenerationOptions? options = null,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
int chunkSize = 128;
var chunks = values.Chunk(chunkSize);
foreach (var chunk in chunks)
{
cancellationToken.ThrowIfCancellationRequested();
GeneratedEmbeddings<Embedding<float>> embeddings = await GenerateAsync(chunk, options, cancellationToken).ConfigureAwait(false);
foreach (var embedding in embeddings)
{
cancellationToken.ThrowIfCancellationRequested();
yield return embedding;
}
}
}
...
}
API Usage
IEmbeddingGenerator<string, Embedding<float>> localEmbeddings = new LocalEmbeddingGenerator();
await foreach (var embedding in localEmbeddings.GenerateStreamingAsync(largeEnumerable, null, cts.Token))
{
...
}
Alternative Designs
Such method is likely going to be similar between different implementations, so maybe an extension method would suffice.
Risks
It does make the implementation slightly more complex, and maybe the existing implementations would only call the GenerateAsync method, without leveraging chunking or a similar approach.