pipecat icon indicating copy to clipboard operation
pipecat copied to clipboard

Implement "OpenAI Image Generation" service as an integration in `openai`

Open oxcabe opened this issue 8 months ago • 6 comments

Service Name

gpt-image-1

Service Website

https://platform.openai.com/docs/api-reference/images

Service Description

The OpenAI API lets you generate and edit images from text prompts, using the GPT Image or DALL·E models.

The Image API provides three endpoints, each with distinct capabilities:

You can also customize the output by specifying the quality, size, format, compression, and whether you would like a transparent background.

Reference: https://openai.com/index/image-generation-api/ https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1

API Information

API Docs: https://platform.openai.com/docs/api-reference/images Auth Method: HTTP Bearer Authentication (https://platform.openai.com/docs/api-reference/authentication)

Key endpoints:

  1. Create image: https://platform.openai.com/docs/api-reference/images/create
  2. Create image edit: https://platform.openai.com/docs/api-reference/images/createEdit
  3. Create image variation: https://platform.openai.com/docs/api-reference/images/createVariation

Would you be willing to help implement this service?

  • [x] Yes, I'd like to contribute
  • [ ] No, I'm just suggesting

oxcabe avatar Apr 24 '25 21:04 oxcabe

@oxcabe this would be a great addition! If you get time to work on it, we'd love to include it.

markbackman avatar Apr 25 '25 01:04 markbackman

Sure! My plan is to have sent an implementation PR before Tue 29. Otherwise, I'd be okay with someone else taking on this instead.

oxcabe avatar Apr 25 '25 11:04 oxcabe

Just started working on this @markbackman

I noticed there's an already existing image service in the openai package, at src/pipecat/services/openai/image.py, which implements dall-e-3 API access. Unexpectedly, I found nothing about this service in the docs.

Given the current state of things, what I intend to do is:

  1. Implement gpt-image-1 on top of what's in there already. Refactoring, if required, will always be non-breaking.
  2. Document the entire service in the same way it's been done for fal and Google Imagen.

AFAIK there's no integration tests for services, or am I wrong? Should I be pushing any tests to the upcoming PR?

oxcabe avatar Apr 26 '25 21:04 oxcabe

You're right, we were missing docs. I just fixed that: https://github.com/pipecat-ai/docs/pull/213.

That plan sounds good!

For now, don't worry about integration tests. We need to write examples of how to do this for services first.

Thanks again for adding this!

markbackman avatar Apr 28 '25 13:04 markbackman

Quick update: I'm delayed on my plans as we're having a cross-country power outage here in Spain 😅

Currently having my OpenAI organisation account verified to have access to gpt-image-1. I'm also making changes related to improving maintainability. Although it's still non-breaking, I'm interested in deprecating the current service arguments into the same approach the rest of Image services use.

oxcabe avatar Apr 29 '25 13:04 oxcabe

Hope you get power back ASAP 🙏

markbackman avatar Apr 30 '25 03:04 markbackman