azure-functions-openai-extension
azure-functions-openai-extension copied to clipboard
WebJobs.Extensions.OpenAI: OpenAI returned an error of type 'invalid_request_error': Too many inputs. The max number of inputs is 16.
I'm attempting to ingest a fairly large TXT file (1 MB) of my own data.
I receive the following exception:
WebJobs.Extensions.OpenAI: OpenAI returned an error of type 'invalid_request_error': Too many inputs. The max number of inputs is 16.
Not sure what I can do about this? Is this something where I need to chunk ahead of time? If so how?
Which binding are you attempting to use? Can you share a code snippet?
It should just be the exact EmailDemo code.
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using System.IO;
using System.Threading.Tasks;
using WebJobs.Extensions.OpenAI;
using WebJobs.Extensions.OpenAI.Search;
namespace AiWebJobInProc;
public static class EmailPromptDemo
{
public record EmbeddingsRequest(string FilePath);
public record SemanticSearchRequest(string Prompt);
// REVIEW: There are several assumptions about how the Embeddings binding and the SemanticSearch bindings
// work together. We should consider creating a higher-level of abstraction for this.
[FunctionName("IngestEmail")]
public static async Task<IActionResult> IngestEmail(
[HttpTrigger(AuthorizationLevel.Anonymous, "post")] EmbeddingsRequest req,
[Embeddings("{FilePath}", InputType.FilePath)] EmbeddingsContext embeddings,
[SemanticSearch("KustoConnectionString", "Documents")] IAsyncCollector<SearchableDocument> output)
{
string title = Path.GetFileNameWithoutExtension(req.FilePath);
await output.AddAsync(new SearchableDocument(title, embeddings));
return new OkObjectResult(new { status = "success", title, chunks = embeddings.Count });
}
[FunctionName("PromptEmail")]
public static IActionResult PromptEmail(
[HttpTrigger(AuthorizationLevel.Anonymous, "post")] SemanticSearchRequest unused,
[SemanticSearch("KustoConnectionString", "Documents", Query = "{Prompt}")] SemanticSearchContext result)
{
return new ContentResult { Content = result.Response, ContentType = "text/plain" };
}
}
Thanks. I quick online search suggests that this is an error returned by OpenAI when hitting the embeddings endpoint. The default chunk size that we use in the embeddings binding is 8K. Since you're providing a 1MB text file, dividing it up into 8K chunks might result it too many chunks (more than 16).
Can you try using larger chunk sizes to see if that helps? You can configure the chunk size using the MaxChunkSize binding attribute property. For example:
[Embeddings("{FilePath}", InputType.FilePath, MaxChunkLength = 16 * 1024 /* 16K */)] EmbeddingsContext embeddings
will give this a go thanks!