GenAIExamples [Bug] AvatarChatbot can NOT work in K8s environment because of functional gap in wav2clip service

trafficstars

Priority

P3-Medium

OS type

Ubuntu

Hardware type

Xeon-GNR

Installation method

[x] Pull docker images from hub.docker.com
[ ] Build docker images from source
[ ] Other

Deploy method

[ ] Docker
[x] Docker Compose
[ ] Kubernetes Helm Charts
[ ] Kubernetes GMC
[x] Other

Running nodes

Single Node

What's the version?

git commit 45d5da2

Description

The current AvatarChatbot example requires the users to map a local disk directory for the wav2clip service to save the result, i.e. see the volumes section in the following docker compose file definition:

wav2lip-service:
    image: ${REGISTRY:-opea}/wav2lip:${TAG:-latest}
    container_name: wav2lip-service
    ports:
      - "7860:7860"
    ipc: host
    volumes:
      - ${PWD}:/outputs
    ... ...

The user will need to read files from the current local directory of {PWD} to get the results. This kind of requirement prohibits us to deploy the AvatarChatBot in Kubernetes(K8s) environment, where the wav2clip service could be scheduled to one of the node in the k8s cluster which the end user will never have access to.

The wav2clip service need to return the generated file content, instead of just returning the filename.

Reproduce steps

n/a

Raw log

Attachments

No response

Feb 10 '25 03:02 lianhao

@ctao456 Can you work on this?

Feb 25 '25 15:02 xiguiw

Remind : @ctao456 Please help fixing this issue.

Mar 25 '25 01:03 xiguiw

@lianhao

No fix till now. This is listed as a known issue. Will add this in release note.

Mar 25 '25 01:03 xiguiw

Hi @lianhao thanks for pointing this out. Understand the issue. @xiguiw thanks for reminding. However, the OPEA animation microservice has the following defined input and output datatypes:

@register_microservice(
    name="opea_service@animation",
    service_type=ServiceType.ANIMATION,
    endpoint="/v1/animation",
    host="0.0.0.0",
    port=9066,
    input_datatype=Base64ByteStrDoc,
    output_datatype=VideoPath,
)

So there's dependencies we need to consider when changing the output datatype. Especially those associated with https://github.com/opea-project/GenAIExamples/tree/main/AvatarChatbot

Also, in the actual wav2lip api server animate realization, a cv2.VideoWriter object is used to write and release the silent .avi video, which is then fused with audio by the ffmpeg software codec, to generate the result .mp4 video https://github.com/opea-project/GenAIComps/blob/3e559df866972631e41a495b0bea068217224f86/comps/third_parties/wav2lip/src/wav2lip_server.py#L152C1-L163C33

out.release()

    ffmpeg.output(
        ffmpeg.input(args.audio),
        ffmpeg.input("temp/result.avi"),
        args.outfile,
        strict="-2",
        crf=23,
        vcodec="libx264",
        preset="medium",
        acodec="aac",
    ).run(overwrite_output=True)

    args.audio = "None"  # IMPORTANT: Reset audio to None for the next audio request

    return {"wav2lip_result": args.outfile}

It's not possible to change/add the output datatype of the actual video content.

Please refer to the above and let us know if you have a better workaround for this application to be deployed on K8s. Thanks!

Mar 25 '25 16:03 ctao456

Hi @lianhao
Chun update the AvatorChatBot today ：
GenAIExamples PR #1776 GenAIComps PR #1540 could you please chekc at K8S env?

Apr 09 '25 09:04 yinghu5

Hi @lianhao could we close this?

Apr 15 '25 03:04 joshuayao

Hi @lianhao could we close this?

The issue is not resolved yet.

Apr 15 '25 04:04 lianhao

Hi @lianhao thanks for pointing this out. Understand the issue. @xiguiw thanks for reminding. However, the OPEA animation microservice has the following defined input and output datatypes:

@register_microservice( name="opea_service@animation", service_type=ServiceType.ANIMATION, endpoint="/v1/animation", host="0.0.0.0", port=9066, input_datatype=Base64ByteStrDoc, output_datatype=VideoPath, ) So there's dependencies we need to consider when changing the output datatype. Especially those associated with https://github.com/opea-project/GenAIExamples/tree/main/AvatarChatbot

Also, in the actual wav2lip api server animate realization, a cv2.VideoWriter object is used to write and release the silent .avi video, which is then fused with audio by the ffmpeg software codec, to generate the result .mp4 video https://github.com/opea-project/GenAIComps/blob/3e559df866972631e41a495b0bea068217224f86/comps/third_parties/wav2lip/src/wav2lip_server.py#L152C1-L163C33

out.release()
ffmpeg.output(
    ffmpeg.input(args.audio),
    ffmpeg.input("temp/result.avi"),
    args.outfile,
    strict="-2",
    crf=23,
    vcodec="libx264",
    preset="medium",
    acodec="aac",
).run(overwrite_output=True)

args.audio = "None"  # IMPORTANT: Reset audio to None for the next audio request

return {"wav2lip_result": args.outfile}
It's not possible to change/add the output datatype of the actual video content.

Please refer to the above and let us know if you have a better workaround for this application to be deployed on K8s. Thanks!

Possible ways to resolve the usability issue of the wav2lip microservice in k8s cluster or in production env:

Return the generated wav2lip file content instead of the file name, just like what speecht5 or tts does.
Instead of saving the wav2lip_result locally, save it to a 3rd party object storage, and returned saved file url to the caller application. This is not recommended, because it is bad for performance and waste network bandwidth to duplicate the file transfer.

We can't assume all the micro-services are running on the same node(unlike the docker compose) in k8s world. This kind of presumption that any micro-services in a e2e example can read/write any resources though a way other than API calls is not applicable in a production environment.

Apr 15 '25 04:04 lianhao

ossible ways to resolve the usability issue of the wav2lip microservice in k8s cluster or in production env

Hi @lianhao Thanks for your information. Added this bug as a known issue in v1.3 release notes.

Apr 15 '25 05:04 joshuayao

Hi @lianhao.

We investigated that we can possibly encode the generated mp4 video content as byte64 str. But that will increase the fill size by 33%. And some videos we generate are around 1 minute (1024 tokens) and can grow into 20-30 MB. Is that acceptable to be returned by the wav2lip microservice?

Also, no example in OPEA currently returns byte64 str for a video. The speecht5 and tts you mentioned returns byte64 str for an audio piece. We will have to include a VideoDoc class in https://github.com/opea-project/GenAIComps/blob/main/comps/cores/proto/docarray.py that has a byte64 str class field.

If the above looks good to you and are necessary for K8s deployment, we will make corresponding changes.

P.S. The image2video microservice in OPEA GenAIComps returns VideoPath as the output as well - https://github.com/opea-project/GenAIComps/blob/41adb29877bab4681bfd5f591ff60e0823e9c0ef/comps/image2video/src/integrations/native.py#L68

Apr 15 '25 15:04 ctao456

Hi @lianhao.

We investigated that we can possibly encode the generated mp4 video content as byte64 str. But that will increase the fill size by 33%. And some videos we generate are around 1 minute (1024 tokens) and can grow into 20-30 MB. Is that acceptable to be returned by the wav2lip microservice?

Also, no example in OPEA currently returns byte64 str for a video. The speecht5 and tts you mentioned returns byte64 str for an audio piece. We will have to include a VideoDoc class in https://github.com/opea-project/GenAIComps/blob/main/comps/cores/proto/docarray.py that has a byte64 str class field.

If the above looks good to you and are necessary for K8s deployment, we will make corresponding changes.

P.S. The image2video microservice in OPEA GenAIComps returns VideoPath as the output as well - https://github.com/opea-project/GenAIComps/blob/41adb29877bab4681bfd5f591ff60e0823e9c0ef/comps/image2video/src/integrations/native.py#L68

In this case, I'd like to propose to store the generated video to a third party object storage, such as MinIO or some other alternatives. In this way, we can support both scenarios, the current stored in local storage scenario, and the stored in remote object storage scenario.

The same requirement should also be applied to image2video microservice too.

I'm just put out this kind of proposal from Kubernetes environment perspective. Also i'm wondering what are the customer's requirements of the wav2clip/image2video microservices? I'm wondering whether the actual customer want is a streaming style output(instead of returning the all the response until the whole video files is generated, but returning the partial generated video streaming as soon as possible as a video data stream) which can used in live video streaming. But this maybe belong to another discussion topic.

Apr 16 '25 01:04 lianhao

GenAIExamples GenAIExamples copied to clipboard

[Bug] AvatarChatbot can NOT work in K8s environment because of functional gap in wav2clip service

Priority

OS type

Hardware type

Installation method

Deploy method

Running nodes

What's the version?

Description

Reproduce steps

Raw log

Attachments

GenAIExamples
GenAIExamples copied to clipboard