tensorzero
tensorzero copied to clipboard
Google Vertex fine-tuned models are addressed by "endpoints" instead of "models"
Inference using fine tuned models through Google Vertex does not follow the same url pattern as gemini models.
The path is
<PROJECT_ID>/locations/<LOCATION>/endpoints/<MODEL_ID>
instead of
<PROJECT_ID>/locations/<LOCATION>/models/<MODEL_ID>
There is currently no TensorZero support for this.
Replication:
request.json
{
"contents": [
{
"role": "USER",
"parts": {
"text" : "Why is sky blue?"
}
}
],
"generation_config": {
"temperature":1.0,
"topP": 1.0,
"topK": 40,
"maxOutputTokens": 100
}
}
This request should return a non-trivial response for the tuned_model_endpoint_name 4816051145170485248
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/<project_id>/locations/us-central1/endpoints/4816051145170485248:generateContent"
There is a tuned_model_name returned after fine tuning: 3148371281787748352@1, but the following request returns nothing:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/<project_id>/locations/us-central1/models/3148371281787748352@1:generateContent"