gateway
gateway copied to clipboard
Audio transcription endpoint does not respect Gateway's override_params
What Happened?
With portkey.chat.completions.create, we can provide override_params to the gateway to specify which model to use. The model parameter in portkey.chat.completions.create itself can be omitted since we specify it in the gateway.
However, with portkey.audio.transcriptions.create, there are two issues.
First, it forces us to specify the model. Otherwise it will cause an error:
portkey.audio.transcriptions.create({
model: '', // <--- Model parameter must be specified. If omitted, it will cause an error
file: fs.createReadStream(fileName),
response_format: 'text',
}, requestOptionsForTranscription);
Second, even more problematic, portkey.audio.transcriptions.create does not respect the override_params of the gateway config object.
In the snipped I have included below, we would expect it to use distil-whisper-large-v3-en as specified in the config object. However, this value is ignored and instead portkey uses the non-existent should-use-model-name-from-gateway value.
What Should Have Happened?
No response
Relevant Code Snippet
const portkeyGatewayConfigForTranscription = {
"strategy": {
"mode": "loadbalance"
},
"targets": [
{
"strategy": {
"mode": "fallback"
},
"targets": [
{
"provider": "groq",
"api_key": "gsk_",
"override_params": {
"model": "distil-whisper-large-v3-en"
}
},
]
},
{
"strategy": {
"mode": "fallback"
},
"targets": [
{
"provider": "groq",
"api_key": "gsk_",
"override_params": {
"model": "distil-whisper-large-v3-en"
}
},
]
}
]
}
const requestOptionsForTranscription = {
metadata: {
},
config: JSON.stringify(portkeyGatewayConfigForTranscription),
}
let transcript = await portkey.audio.transcriptions.create({
model: 'should-use-model-name-from-gateway',
file: fs.createReadStream(fileName),
response_format: 'text',
}, requestOptionsForTranscription);
Your Twitter/LinkedIn
No response
this is expected @AgileEduLabs , override_params is for json parameters,
it is unfeasible for a gateay to load a multipart/form-data payload into memory and transform the parameters
Hi @narengogi , it seems that data is encoded to multipart/form-data on the gateway itself like so:
src/providers/openai/api.ts
if (
fn === 'createTranscription' ||
fn === 'createTranslation' ||
fn === 'uploadFile'
)
headersObj['Content-Type'] = 'multipart/form-data';
If I understood this correctly, then override_params should be applied first, before data is encoded to multipart/form-data and sent to the providers. In theory it should be a simple fix.
Regardless of the complexity of implementing this though, I think it is essential to get this to work.
If override_params cannot modify the model name on the gateway, then the gateway features are essentially useless. You won't be able to do any sort of meaningful fallback nor loadbalancing, because each provider has their own model name. Something like this will not be possible:
{
"strategy": {
"mode": "fallback"
},
"targets": [
{
"provider": "provider1",
"api_key": "gsk_",
"override_params": {
"model": "some-model-name"
}
},
{
"provider": "provider2",
"api_key": "gsk_",
"override_params": {
"model": "same-model-but-named-differently-by-provider" // <----
}
},
]
},
So I think it is crucial that the code is fixed to allow override_params to work. Kindly help reopen this issue.
Hi @narengogi and @VisargD , as mentioned above, if override_params cannot modify the model name on the gateway, then the gateway features are essentially useless. Hope you guys can re-open the issue and fix the gateway.
@AgileEduLabs the gateway will not load a multipart form data file into memory and then transform the payload. Imagine a simple audio file of 20MB that a user wants to send to /audio/transcriptions, now imagine 50 users doing this in parallel, that is GB in memory consumption, the gateway is written to run on the edge, the binary is ~500kb and memory footprint at 5000RPS is ~100MB If this functionality is absolutely vital to you, I suggest you make the changes in a fork of the gateway, but we will not be doing this in the main repository
Hi @narengogi I did an extensive deep dive into the code base. I previously misunderstood how and where data was encoded as multipart/form-data, but now it is clear to me.
First, the audio file as well as the model name and other parameters are encoded on the client side as multipart/form-data and sent to the gateway.
On the gateway, the data received is handled by createTranscriptionHandler (index.ts line 160):
app.post(
'/v1/audio/transcriptions',
requestValidator,
createTranscriptionHandler
);
In createTranscriptionHandler, the multipart/form-data is actually loaded into memory and parsed into a javascript FormData object (createTranscriptionHandler.ts line 19):
let request = await c.req.raw.formData();
This FormData object is then passed to tryTargetsRecursively and tryPost.
Before the data is sent to the provider, the gateway then re-encodes the FormData javascript object into multipart/form-data. For example src/providers/openai/api.ts line 17:
if (
fn === 'createTranscription' ||
fn === 'createTranslation' ||
fn === 'uploadFile'
)
headersObj['Content-Type'] = 'multipart/form-data';
The source of the problem lies in transformToProviderRequest.ts on line 235. If body is FormData, then override parameters are not applied and instead the function returns early with the requestBody unmodified:
if (requestBody instanceof FormData || requestBody instanceof ArrayBuffer)
return requestBody;
At first glance this makes sense because you cannot apply a simple spread operator to FormData nor ArrayBuffer, so we just ignore the override_params and return early.
However, the more elegant solution is to transform the javascript FormData object based on the override_params. Since FormData is already a javascript object due to your existing code, the computational cost to override the keys is negligible. Even if you have a large 20GB audio file, the file content itself is not touched. FormData keeps the same Blob / File reference, so the audio is never copied. Instead it just walks the key/value metadata of the object and replaces the relevant data, which is lightning fast:
for (const [k, v] of Object.entries(providerOptions.overrideParams)) {
requestBody.delete(k); // If the key already exists, delete it first so we always overwrite.
// Value can be string, Blob, File, or an array of those.
const values = Array.isArray(v) ? v : [v];
for (const val of values) {
requestBody.append(k, val);
}
}
@VisargD @narengogi Please accept my pull request https://github.com/Portkey-AI/gateway/pull/1242. I believe this is still in keeping with the spirit of the lightweight nature of the gateway.
@AgileEduLabs I've seen your PR, I'll make the necessary changes for supporting override params correctly