Google Vertex fine-tuned models are addressed by "endpoints" instead of "models"

Open anndvision opened this issue 7 months ago • 0 comments

Inference using fine tuned models through Google Vertex does not follow the same url pattern as gemini models.

The path is

<PROJECT_ID>/locations/<LOCATION>/endpoints/<MODEL_ID>

instead of

<PROJECT_ID>/locations/<LOCATION>/models/<MODEL_ID>

There is currently no TensorZero support for this.

Replication:

request.json

{
    "contents": [
        {
            "role": "USER",
            "parts": {
                "text" : "Why is sky blue?"
            }
        }
    ],
    "generation_config": {
        "temperature":1.0,
        "topP": 1.0,
        "topK": 40,
        "maxOutputTokens": 100
    }
}

This request should return a non-trivial response for the tuned_model_endpoint_name 4816051145170485248

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://us-central1-aiplatform.googleapis.com/v1/projects/<project_id>/locations/us-central1/endpoints/4816051145170485248:generateContent"

There is a tuned_model_name returned after fine tuning: 3148371281787748352@1, but the following request returns nothing:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://us-central1-aiplatform.googleapis.com/v1/projects/<project_id>/locations/us-central1/models/3148371281787748352@1:generateContent"

May 23 '25 20:05 anndvision