Vertex (override/custom model provider); inspect request?
Hi,
This library looks really useful - thank you. I have a few questions:
- The HTTP client mentions that it supports Pooling, but are individual responses processed only once all responses have arrived (which is the way Laravel Pool does it by default), or is there an option to process each response as it arrives (which is what I need)? Can you provide an example of its usage, including how responses are handled?
- I use Cloudflare AI Gateway for fallback support, which involves sending multiple provider requests wrapped in a single request to Cloudflare. Is request middleware the place to do this? That is, get a single request to Gemini, duplicate it and wrap it in the required Cloudflare AI Gateway format? Or better to write a custom provider if that's possible?
Thanks again.
Async is not yet supported, the code is prototype / work-in-progress. Do not use it. I will be working on it after 1.0 release.
HTTP middleware is most likely not a good way to do most forms of request distribution across multiple providers.
One reason is that provider APIs are different - more or less, depending on the provider.
Even if given LLM API vendor declares compatibility with OpenAI, it is not always 100% the same across all aspects - e.g. how authentication works, how usage is reported, if/how reasoning content is returned, what are the details of structured outputs via JSON/JSON Schema declarations, how tools are declared and supported, etc.
But even if today the compatibility is 100%, tomorrow it may change.
Therefore Polyglot, which is Instructor's LLM API access layer (currently focused on inference and embeddings generation), translates generic InferenceRequest to LLM provider specific HttpRequest.
I think I understand what you are aiming for and I have the same need, so expect that parallel/async requests and splitting request to send it across multiple providers is going to be added to Instructor (and Polyglot) in the future.