fastrtc icon indicating copy to clipboard operation
fastrtc copied to clipboard

app.py Gemini/Twilio w/ robust error handling, faster image encoding, and UI status updates

Open ahundt opened this issue 2 days ago • 3 comments

Note, while this code worked with gradio_webrtc==0.0.28 (modulo the bugs previously discussed https://github.com/googleapis/python-genai/issues/380 and https://github.com/aiortc/aiortc/issues/1258 ), it currently crashes with fastrtc==0.0.6 when run locally on an m3 mac. image with this version info, while running on an m3 mac:

[project]
name = "gemini-audio-video-chat"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
    "fastrtc[vad, tts]==0.0.6",
    "google-genai==0.3.0",
    "twilio",
    "opencv-python",
    "dotenv",
]

And the output doesn't betray any major errors:

athundt@Andrews2024MBP|~/source/gemini-audio-video-chat on ui_improvements!?
± uv run app.py
/Users/athundt/source/gemini-audio-video-chat/.venv/lib/python3.13/site-packages/google_crc32c/__init__.py:29: RuntimeWarning: As the c extension couldn't be imported, `google-crc32c` is using a pure python implementation that is significantly slower. If possible, please configure a c build environment and compile the extension
  warnings.warn(_SLOW_CRC32C_WARNING, RuntimeWarning)
2025-02-26 16:53:48,133 - INFO - Attempting to get Twilio credentials (attempt 1)...
2025-02-26 16:53:48,190 - INFO - -- BEGIN Twilio API Request --
2025-02-26 16:53:48,190 - INFO - POST Request: https://api.twilio.com/2010-04-01/Accounts/ACfa954a3e72949b7b8c02f42beb438966/Tokens.json
2025-02-26 16:53:48,190 - INFO - Headers:
2025-02-26 16:53:48,190 - INFO - Content-Type : application/x-www-form-urlencoded
2025-02-26 16:53:48,190 - INFO - Accept : application/json
2025-02-26 16:53:48,190 - INFO - User-Agent : twilio-python/9.4.6 (Darwin x86_64) Python/3.13.2
2025-02-26 16:53:48,190 - INFO - X-Twilio-Client : python-9.4.6
2025-02-26 16:53:48,190 - INFO - Accept-Charset : utf-8
2025-02-26 16:53:48,190 - INFO - -- END Twilio API Request --
2025-02-26 16:53:48,499 - INFO - Response Status Code: 201
2025-02-26 16:53:48,499 - INFO - Response Headers: {'Content-Type': 'application/json;charset=utf-8', 'Content-Length': '1192', 'Connection': 'keep-alive', 'Date': 'Wed, 26 Feb 2025 21:53:48 GMT', 'Twilio-Concurrent-Requests': '1', 'Twilio-Request-Id': 'RQ16281ff7d4de919554b87046cba1e036', 'Twilio-Request-Duration': '0.049', 'X-Home-Region': 'us1', 'X-API-Domain': 'api.twilio.com', 'Strict-Transport-Security': 'max-age=31536000', 'X-Cache': 'Miss from cloudfront', 'Via': '1.1 1fecb697c6f121d7ce54a35628ac154e.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'IAD61-P2', 'X-Amz-Cf-Id': '1dr27ZIYkNBQo-G61YyOD_cwC3txTht7xO5zdrFQMw5zbBtR-eGJFA==', 'X-Powered-By': 'AT-5000', 'X-Shenanigans': 'none', 'Vary': 'Origin'}
2025-02-26 16:53:48,499 - INFO - Twilio credentials response: {'iceServers': [{'url': 'stun:global.stun.twilio.com:3478', 'urls': 'stun:global.stun.twilio.com:3478'}, {'credential': 'ZdosbIThoHiWTOOjDOt0T4wBygdWlfzjXjJOocGWu3Y=', 'url': 'turn:global.turn.twilio.com:3478?transport=udp', 'urls': 'turn:global.turn.twilio.com:3478?transport=udp', 'username': 'c9136edbb903bdf9a66799be17f23526e45f2b87155497dad4b9ba4ef97a44a1'}, {'credential': 'ZdosbIThoHiWTOOjDOt0T4wBygdWlfzjXjJOocGWu3Y=', 'url': 'turn:global.turn.twilio.com:3478?transport=tcp', 'urls': 'turn:global.turn.twilio.com:3478?transport=tcp', 'username': 'c9136edbb903bdf9a66799be17f23526e45f2b87155497dad4b9ba4ef97a44a1'}, {'credential': 'ZdosbIThoHiWTOOjDOt0T4wBygdWlfzjXjJOocGWu3Y=', 'url': 'turn:global.turn.twilio.com:443?transport=tcp', 'urls': 'turn:global.turn.twilio.com:443?transport=tcp', 'username': 'c9136edbb903bdf9a66799be17f23526e45f2b87155497dad4b9ba4ef97a44a1'}], 'iceTransportPolicy': 'relay'}
2025-02-26 16:53:48,499 - INFO - Twilio TURN server available.
2025-02-26 16:53:48,566 - INFO - -- BEGIN Twilio API Request --
2025-02-26 16:53:48,566 - INFO - POST Request: https://api.twilio.com/2010-04-01/Accounts/ACfa954a3e72949b7b8c02f42beb438966/Tokens.json
2025-02-26 16:53:48,566 - INFO - Headers:
2025-02-26 16:53:48,566 - INFO - Content-Type : application/x-www-form-urlencoded
2025-02-26 16:53:48,566 - INFO - Accept : application/json
2025-02-26 16:53:48,566 - INFO - User-Agent : twilio-python/9.4.6 (Darwin x86_64) Python/3.13.2
2025-02-26 16:53:48,566 - INFO - X-Twilio-Client : python-9.4.6
2025-02-26 16:53:48,566 - INFO - Accept-Charset : utf-8
2025-02-26 16:53:48,566 - INFO - -- END Twilio API Request --
2025-02-26 16:53:48,689 - INFO - Response Status Code: 201
2025-02-26 16:53:48,689 - INFO - Response Headers: {'Content-Type': 'application/json;charset=utf-8', 'Content-Length': '1192', 'Connection': 'keep-alive', 'Date': 'Wed, 26 Feb 2025 21:53:48 GMT', 'Twilio-Concurrent-Requests': '1', 'Twilio-Request-Id': 'RQ07d1dc4e5762a2408a3cbdd683b7513b', 'Twilio-Request-Duration': '0.058', 'X-Home-Region': 'us1', 'X-API-Domain': 'api.twilio.com', 'Strict-Transport-Security': 'max-age=31536000', 'X-Cache': 'Miss from cloudfront', 'Via': '1.1 7c52bc60e0da5f557ed6047264a41c18.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'IAD61-P2', 'X-Amz-Cf-Id': 'DzRq4ZHXRB2auwP9sAUGH160f8FYLqBIaRBDiLyxi0k8AmWMLymIRQ==', 'X-Powered-By': 'AT-5000', 'X-Shenanigans': 'none', 'Vary': 'Origin'}
* Running on local URL:  http://127.0.0.1:7860
2025-02-26 16:53:48,831 - INFO - HTTP Request: GET http://127.0.0.1:7860/gradio_api/startup-events "HTTP/1.1 200 OK"
2025-02-26 16:53:48,845 - INFO - HTTP Request: HEAD http://127.0.0.1:7860/ "HTTP/1.1 200 OK"

To create a public link, set `share=True` in `launch()`.
2025-02-26 16:53:48,855 - INFO - HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"

However, the current app.py also fails similarly on fastrtc==0.0.6 when run locally, as did this suggested huggingface spaces version b88286b.

Continuing from this discussion: https://huggingface.co/spaces/freddyaboulton/gemini-audio-video-chat/discussions/1

See also the bugs previously discussed: https://github.com/googleapis/python-genai/issues/380 and https://github.com/aiortc/aiortc/issues/1258

This commit improves the Gemini and Twilio integration with a focus on better error handling, UI feedback, connection stability, and faster image encoding.

  • Faster, Robust Image Encoding: Enhanced encode_image with comprehensive input validation (NaN/Inf, shape), normalization, and faster JPEG encoding error handling using OpenCV.
  • Synchronous Twilio Check (Pre-UI): Implemented synchronous Twilio TURN server availability check before Gradio initialization to avoid race conditions. Includes retry logic with exponential backoff. This ensures accurate status before the UI loads.
  • UI Status Updates:
    • Added immediate Twilio status update on UI load.
    • Gemini connection status is displayed and updated to inform users.
  • Robust Gemini Connection: Improved Gemini connection logic with more comprehensive error handling and UI feedback on connection failures.
  • Improved Shutdown: The GeminiHandler.shutdown method is more robust to ensure proper cleanup and prevent lingering issues.
  • API key validation: Added API key validation to improve the user experience.

ahundt avatar Feb 26 '25 21:02 ahundt