GenAIExamples
GenAIExamples copied to clipboard
[Bug] AvatarChatbot can NOT work in K8s environment because of functional gap in wav2clip service
Priority
P3-Medium
OS type
Ubuntu
Hardware type
Xeon-GNR
Installation method
- [x] Pull docker images from hub.docker.com
- [ ] Build docker images from source
- [ ] Other
Deploy method
- [ ] Docker
- [x] Docker Compose
- [ ] Kubernetes Helm Charts
- [ ] Kubernetes GMC
- [x] Other
Running nodes
Single Node
What's the version?
git commit 45d5da2
Description
The current AvatarChatbot example requires the users to map a local disk directory for the wav2clip service to save the result, i.e. see the volumes section in the following docker compose file definition:
wav2lip-service:
image: ${REGISTRY:-opea}/wav2lip:${TAG:-latest}
container_name: wav2lip-service
ports:
- "7860:7860"
ipc: host
volumes:
- ${PWD}:/outputs
... ...
The user will need to read files from the current local directory of {PWD} to get the results. This kind of requirement prohibits us to deploy the AvatarChatBot in Kubernetes(K8s) environment, where the wav2clip service could be scheduled to one of the node in the k8s cluster which the end user will never have access to.
The wav2clip service need to return the generated file content, instead of just returning the filename.
Reproduce steps
n/a
Raw log
Attachments
No response
@ctao456 Can you work on this?
Remind : @ctao456 Please help fixing this issue.
@lianhao
No fix till now. This is listed as a known issue. Will add this in release note.
Hi @lianhao thanks for pointing this out. Understand the issue. @xiguiw thanks for reminding. However, the OPEA animation microservice has the following defined input and output datatypes:
@register_microservice(
name="opea_service@animation",
service_type=ServiceType.ANIMATION,
endpoint="/v1/animation",
host="0.0.0.0",
port=9066,
input_datatype=Base64ByteStrDoc,
output_datatype=VideoPath,
)
So there's dependencies we need to consider when changing the output datatype. Especially those associated with https://github.com/opea-project/GenAIExamples/tree/main/AvatarChatbot
Also, in the actual wav2lip api server animate realization, a cv2.VideoWriter object is used to write and release the silent .avi video, which is then fused with audio by the ffmpeg software codec, to generate the result .mp4 video https://github.com/opea-project/GenAIComps/blob/3e559df866972631e41a495b0bea068217224f86/comps/third_parties/wav2lip/src/wav2lip_server.py#L152C1-L163C33
out.release()
ffmpeg.output(
ffmpeg.input(args.audio),
ffmpeg.input("temp/result.avi"),
args.outfile,
strict="-2",
crf=23,
vcodec="libx264",
preset="medium",
acodec="aac",
).run(overwrite_output=True)
args.audio = "None" # IMPORTANT: Reset audio to None for the next audio request
return {"wav2lip_result": args.outfile}
It's not possible to change/add the output datatype of the actual video content.
Please refer to the above and let us know if you have a better workaround for this application to be deployed on K8s. Thanks!
Hi @lianhao
Chun update the AvatorChatBot today :
GenAIExamples PR #1776
GenAIComps PR #1540
could you please chekc at K8S env?
Hi @lianhao could we close this?
Hi @lianhao thanks for pointing this out. Understand the issue. @xiguiw thanks for reminding. However, the OPEA animation microservice has the following defined input and output datatypes:
@register_microservice( name="opea_service@animation", service_type=ServiceType.ANIMATION, endpoint="/v1/animation", host="0.0.0.0", port=9066, input_datatype=Base64ByteStrDoc, output_datatype=VideoPath, ) So there's dependencies we need to consider when changing the output datatype. Especially those associated with https://github.com/opea-project/GenAIExamples/tree/main/AvatarChatbot
Also, in the actual wav2lip api server animate realization, a cv2.VideoWriter object is used to write and release the silent .avi video, which is then fused with audio by the ffmpeg software codec, to generate the result .mp4 video https://github.com/opea-project/GenAIComps/blob/3e559df866972631e41a495b0bea068217224f86/comps/third_parties/wav2lip/src/wav2lip_server.py#L152C1-L163C33
out.release()
ffmpeg.output( ffmpeg.input(args.audio), ffmpeg.input("temp/result.avi"), args.outfile, strict="-2", crf=23, vcodec="libx264", preset="medium", acodec="aac", ).run(overwrite_output=True) args.audio = "None" # IMPORTANT: Reset audio to None for the next audio request return {"wav2lip_result": args.outfile}It's not possible to change/add the output datatype of the actual video content.
Please refer to the above and let us know if you have a better workaround for this application to be deployed on K8s. Thanks!
Possible ways to resolve the usability issue of the wav2lip microservice in k8s cluster or in production env:
- Return the generated wav2lip file content instead of the file name, just like what
speecht5orttsdoes. - Instead of saving the wav2lip_result locally, save it to a 3rd party object storage, and returned saved file url to the caller application. This is not recommended, because it is bad for performance and waste network bandwidth to duplicate the file transfer.
We can't assume all the micro-services are running on the same node(unlike the docker compose) in k8s world. This kind of presumption that any micro-services in a e2e example can read/write any resources though a way other than API calls is not applicable in a production environment.
ossible ways to resolve the usability issue of the wav2lip microservice in k8s cluster or in production env
Hi @lianhao Thanks for your information. Added this bug as a known issue in v1.3 release notes.
Hi @lianhao.
We investigated that we can possibly encode the generated mp4 video content as byte64 str. But that will increase the fill size by 33%. And some videos we generate are around 1 minute (1024 tokens) and can grow into 20-30 MB. Is that acceptable to be returned by the wav2lip microservice?
Also, no example in OPEA currently returns byte64 str for a video. The speecht5 and tts you mentioned returns byte64 str for an audio piece. We will have to include a VideoDoc class in https://github.com/opea-project/GenAIComps/blob/main/comps/cores/proto/docarray.py that has a byte64 str class field.
If the above looks good to you and are necessary for K8s deployment, we will make corresponding changes.
P.S. The image2video microservice in OPEA GenAIComps returns VideoPath as the output as well - https://github.com/opea-project/GenAIComps/blob/41adb29877bab4681bfd5f591ff60e0823e9c0ef/comps/image2video/src/integrations/native.py#L68
Hi @lianhao.
We investigated that we can possibly encode the generated mp4 video content as byte64 str. But that will increase the fill size by 33%. And some videos we generate are around 1 minute (1024 tokens) and can grow into 20-30 MB. Is that acceptable to be returned by the wav2lip microservice?
Also, no example in OPEA currently returns byte64 str for a video. The
speecht5andttsyou mentioned returns byte64 str for an audio piece. We will have to include aVideoDocclass in https://github.com/opea-project/GenAIComps/blob/main/comps/cores/proto/docarray.py that has a byte64 str class field.If the above looks good to you and are necessary for K8s deployment, we will make corresponding changes.
P.S. The image2video microservice in OPEA GenAIComps returns VideoPath as the output as well - https://github.com/opea-project/GenAIComps/blob/41adb29877bab4681bfd5f591ff60e0823e9c0ef/comps/image2video/src/integrations/native.py#L68
In this case, I'd like to propose to store the generated video to a third party object storage, such as MinIO or some other alternatives. In this way, we can support both scenarios, the current stored in local storage scenario, and the stored in remote object storage scenario.
The same requirement should also be applied to image2video microservice too.
I'm just put out this kind of proposal from Kubernetes environment perspective. Also i'm wondering what are the customer's requirements of the wav2clip/image2video microservices? I'm wondering whether the actual customer want is a streaming style output(instead of returning the all the response until the whole video files is generated, but returning the partial generated video streaming as soon as possible as a video data stream) which can used in live video streaming. But this maybe belong to another discussion topic.