azure-functions-openai-extension icon indicating copy to clipboard operation
azure-functions-openai-extension copied to clipboard

WebJobs.Extensions.OpenAI: OpenAI returned an error of type 'invalid_request_error': Too many inputs. The max number of inputs is 16.

Open aherrick opened this issue 2 years ago • 4 comments

I'm attempting to ingest a fairly large TXT file (1 MB) of my own data.

I receive the following exception:

WebJobs.Extensions.OpenAI: OpenAI returned an error of type 'invalid_request_error': Too many inputs. The max number of inputs is 16.

Not sure what I can do about this? Is this something where I need to chunk ahead of time? If so how?

aherrick avatar Nov 23 '23 01:11 aherrick

Which binding are you attempting to use? Can you share a code snippet?

cgillum avatar Nov 23 '23 03:11 cgillum

It should just be the exact EmailDemo code.

using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using System.IO;
using System.Threading.Tasks;
using WebJobs.Extensions.OpenAI;
using WebJobs.Extensions.OpenAI.Search;

namespace AiWebJobInProc;

public static class EmailPromptDemo
{
    public record EmbeddingsRequest(string FilePath);
    public record SemanticSearchRequest(string Prompt);

    // REVIEW: There are several assumptions about how the Embeddings binding and the SemanticSearch bindings
    //         work together. We should consider creating a higher-level of abstraction for this.
    [FunctionName("IngestEmail")]
    public static async Task<IActionResult> IngestEmail(
        [HttpTrigger(AuthorizationLevel.Anonymous, "post")] EmbeddingsRequest req,
        [Embeddings("{FilePath}", InputType.FilePath)] EmbeddingsContext embeddings,
        [SemanticSearch("KustoConnectionString", "Documents")] IAsyncCollector<SearchableDocument> output)
    {
        string title = Path.GetFileNameWithoutExtension(req.FilePath);
        await output.AddAsync(new SearchableDocument(title, embeddings));
        return new OkObjectResult(new { status = "success", title, chunks = embeddings.Count });
    }

    [FunctionName("PromptEmail")]
    public static IActionResult PromptEmail(
        [HttpTrigger(AuthorizationLevel.Anonymous, "post")] SemanticSearchRequest unused,
        [SemanticSearch("KustoConnectionString", "Documents", Query = "{Prompt}")] SemanticSearchContext result)
    {
        return new ContentResult { Content = result.Response, ContentType = "text/plain" };
    }
}

aherrick avatar Nov 23 '23 14:11 aherrick

Thanks. I quick online search suggests that this is an error returned by OpenAI when hitting the embeddings endpoint. The default chunk size that we use in the embeddings binding is 8K. Since you're providing a 1MB text file, dividing it up into 8K chunks might result it too many chunks (more than 16).

Can you try using larger chunk sizes to see if that helps? You can configure the chunk size using the MaxChunkSize binding attribute property. For example:

[Embeddings("{FilePath}", InputType.FilePath, MaxChunkLength = 16 * 1024 /* 16K */)] EmbeddingsContext embeddings

cgillum avatar Nov 23 '23 16:11 cgillum

will give this a go thanks!

aherrick avatar Nov 26 '23 15:11 aherrick