spring-ai icon indicating copy to clipboard operation
spring-ai copied to clipboard

fix: Add instructions support for OpenAI TTS models

Open JGoP-L opened this issue 1 week ago • 0 comments
trafficstars

Description

Add optional instructions parameter to OpenAI TTS models to support style/tone guidance for speech synthesis. This addresses the need to control prosody, emotion, and delivery style for models that support it (currently gpt-4o-mini-tts).

Fixes #4388

Changes

Core Implementation

  • Add instructions field to OpenAiAudioSpeechOptions with getter/setter and builder support
  • Add instructions parameter to OpenAiAudioApi.SpeechRequest record
  • Implement conditional passing of instructions based on model whitelist
  • Add INSTRUCTIONS_SUPPORTED_MODELS static set (currently contains gpt-4o-mini-tts)
  • Log warning for unsupported models instead of throwing exception
  • Update equals(), hashCode(), toString(), and copy() methods to include instructions

Testing

  • Add OpenAiAudioSpeechModelInstructionsTests to verify conditional instruction passing
  • Add OpenAiSpeechRequestInstructionsSerializationTests to verify JSON serialization

Documentation

  • Update openai-speech.adoc with:
    • Configuration property for spring.ai.openai.audio.speech.options.instructions
    • Runtime options example showing instructions usage
    • New section “Using Instructions for Style Control” with implementation details
    • Important note about model compatibility

Backward Compatibility

Fully backward compatible:

  • instructions parameter is optional (nullable)
  • All existing constructors and methods remain unchanged
  • Unsupported models gracefully ignore instructions with a warning log
  • No breaking changes to API or configuration

Usage Example

OpenAiAudioSpeechOptions options = OpenAiAudioSpeechOptions.builder()
    .model("gpt-4o-mini-tts")
    .voice(OpenAiAudioApi.SpeechRequest.Voice.VERSE)
    .instructions("Friendly; warm tone; natural pauses")
    .build();

TextToSpeechPrompt prompt = new TextToSpeechPrompt("Welcome!", options);
TextToSpeechResponse response = model.call(prompt);

Checklist

  • [x] Add a Signed-off-by line to each commit (git commit -s)
  • [x] Rebase changes on the latest main branch and squash commits
  • [x] Add/Update unit tests
  • [x] Run the build and ensure all tests pass (mvn test)
  • [x] Update documentation if necessary
  • [x] Ensure backward compatibility
  • [x] Follow existing code style and conventions

Test Results

JGoP-L avatar Nov 14 '25 02:11 JGoP-L