generative-ai-python icon indicating copy to clipboard operation
generative-ai-python copied to clipboard

Improve key naming clarity and consistency across REST requests

Open ossa-ma opened this issue 10 months ago • 6 comments

Description of the feature request:

It is not immediately obvious that both camel casing and underscore pythonic style casing are interchangeable for the input data keys to Gemini. It would be helpful to note this somewhere as there are several examples where both styles are used in a confusing fashion often time in the same input json:

  • When specifying the system instructions to the model in the /cachedContents endpoint it uses camel case: 'systemInstruction': https://github.com/google-gemini/generative-ai-python/blob/8849d4f46010ce4ae68243c4f8a44a138b56598f/samples/rest/count_tokens.sh#L196 However, in the same example, it uses 'system_instructions' as input json to the /generateContent endpoint: https://github.com/google-gemini/generative-ai-python/blob/8849d4f46010ce4ae68243c4f8a44a138b56598f/samples/rest/count_tokens.sh#L218 While it does seem like both can be used

  • The 'generationConfig' dictionary, in the input json to the /generateContent endpoint, contains several keys in the camel case notation and several in the underscore pythonic notation which is confusing and messy:

An example: ...

"generationConfig": {
            "stopSequences": [
                "Title"
            ],
            "temperature": 1.0,
            "maxOutputTokens": 800,
            "topP": 0.8,
            "topK": 10,
            "response_mime_type": "application/json",
             "response_schema": {
                "type": "ARRAY",
                "items": {
                  "type": "OBJECT",
                  "properties": {
                    "recipe_name": {"type":"STRING"},
                  }
                }
              }
}

...

References: https://github.com/google-gemini/generative-ai-python/blob/8849d4f46010ce4ae68243c4f8a44a138b56598f/samples/rest/configure_model_parameters.sh#L20 https://github.com/google-gemini/generative-ai-python/blob/8849d4f46010ce4ae68243c4f8a44a138b56598f/samples/rest/controlled_generation.sh#L13

It also doesn't help that the general version of the API reference denotes everything in camel case so if a developer is only referring to that it would create a confusing situation: https://ai.google.dev/api/generate-content#generationconfig

What problem are you trying to solve with this feature?

Improved clarity and consistency in Gemini API documentation, reduces potential for errors imo.

Any other information you'd like to share?

No response

ossa-ma avatar Feb 18 '25 01:02 ossa-ma

@Giom-V - you were looking at rest samples, maybe look into enforcing this at the same time?

MarkDaoust avatar Feb 18 '25 17:02 MarkDaoust

I think it would also be useful to add a script example for generating content with pdfs as inline data (not file data).

I have included an example below:

echo "[START text_gen_multimodal_two_pdf_inline]"
# Use a temporary file to hold the base64 encoded pdf data

PDF_PATH_1=${MEDIA_DIR}/test_1.pdf
PDF_PATH_2=${MEDIA_DIR}/test_2.pdf

TEMP_1_B64=$(mktemp)
trap 'rm -f "$TEMP_1_B64"' EXIT
base64 $B64FLAGS $PDF_PATH_1 > "$TEMP_1_B64"

TEMP_2_B64=$(mktemp)
trap 'rm -f "$TEMP_2_B64"' EXIT
base64 $B64FLAGS $PDF_PATH_2 > "$TEMP_2_B64"

# Use a temporary file to hold the JSON payload
TEMP_JSON=$(mktemp)
trap 'rm -f "$TEMP_JSON"' EXIT

cat > "$TEMP_JSON" << EOF
{
  "contents": [{
    "role": "user",
    "parts":[
      {"text": "Extract the pet names, type and ages from these documents."},
      {
        "inline_data": {
          "mime_type":"application/pdf",
          "data": "$(cat "$TEMP_1_B64")"
        },
        "inline_data": {
          "mime_type":"application/pdf",
          "data": "$(cat "$TEMP_2_B64")"
        }
      }
    ]
  }],
  "system_instruction": {
    "parts": [
      {"text": "Extract the pet names and ages from these documents and return them in the following JSON format:

                Pet = {\"name\": str, \"type\": str, \"age\": int}
                Return: list[Pet]"
      }
    ]
  },
  "generation_config": {
    "temperature": 0.2,
    "top_p": 0.95,
    "top_k": 40,
    "max_output_tokens": 1000,
    "response_mime_type": "application/json"
  }
}
EOF

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d "@$TEMP_JSON" 2> /dev/null

ossa-ma avatar Feb 18 '25 20:02 ossa-ma

Hey, I would love to work on this issue.

demoncoder-crypto avatar Mar 10 '25 03:03 demoncoder-crypto

@demoncoder-crypto Feel free to send a PR, I'll review it.

Giom-V avatar Mar 10 '25 15:03 Giom-V

To resolve this issue, I recommend

1)- Updating all shell scripts in the samples/rest directory to use consistent naming 2)- Documenting the flexibility- by adding a note in the REST API documentation which explains both naming conventions that are supported and including this information in the README.md or in comments at the top of the REST example files. 3)- Developing a Style guide to demonstrate internal guidelines for contributors.

demoncoder-crypto avatar Mar 11 '25 01:03 demoncoder-crypto

@demoncoder-crypto I've left comments on the pull request regarding some changes that you missed.

ossa-ma avatar Mar 11 '25 01:03 ossa-ma