audio2photoreal
audio2photoreal copied to clipboard
ERROR: Boolean value of Tensor with more than one value is ambiguous
This is when running the demo
Hi! Thanks for reporting this issue. Could you please provide me with more context into the issue? (E.g. stack trace, inputs, screenshots etc)
hi @evonneng sure thing!
➜ audio2photoreal git:(main) ✗ source ./.venv/bin/activate
(.venv) ➜ audio2photoreal git:(main) ✗ python -m demo.demo
running on... cuda:0
adding lip conditioning ./assets/iter-0200000.pt
Loading checkpoints from [checkpoints/diffusion/c1_face/model000155000.pt]...
running on... cuda:0
using keyframes: torch.Size([1, 20, 256])
loading checkpoint from checkpoints/vq/c1_pose/net_iter300000.pth
loading TRANSFORMER checkpoint from checkpoints/guide/c1_pose/checkpoints/iter-0100000.pt
Loading checkpoints from [checkpoints/diffusion/c1_pose/model000340000.pt]...
/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
WARNING:visualize.ca_body.nn.color_cal:Requested color-calibration identity camera not present, defaulting to 400883.
loading... ./checkpoints/ca_body/data/PXB184/body_dec.ckpt
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/gradio/queueing.py", line 489, in call_prediction
output = await route_utils.call_process_api(
File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/gradio/route_utils.py", line 232, in call_process_api
output = await app.get_blocks().process_api(
File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/gradio/blocks.py", line 1561, in process_api
result = await self.call_function(
File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/gradio/blocks.py", line 1179, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
return await future
File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/gradio/utils.py", line 678, in wrapper
response = f(*args, **kwargs)
File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/demo/demo.py", line 216, in audio_to_avatar
face_results, pose_results, audio = generate_results(audio, num_repetitions, top_p)
File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/demo/demo.py", line 176, in generate_results
dual_audio[:, :, 0] = y / max(y)
RuntimeError: Boolean value of Tensor with more than one value is ambiguous
This happens after recording audio in the gradio app and starting the generation, thanks!
Ah I see! I believe the issue should be because the max function is returning more than a scalar value (eg if your audio recording is 2xT for binaural audio). Currently I am only supporting single channel audio. But I can push a fix later to combine your audio to single channel and ping this thread after!
Nice one, I found this if it helps just changing it client-side; https://blog.mozilla.org/webrtc/channelcount-microphone-constraint/
And something I found for a possible solution in python; https://stackoverflow.com/questions/30401042/stereo-to-mono-wav-in-python
Okay, as a quick workaround, I updated line 241;
gr.Audio(sources=["microphone", "upload"] ),
then recorded an mono audio track in Audacity to mp3 and uploaded, seems to now be running
Okay, I got a generation, but the audio is VERY quiet - unsure what happened here, source seems fine.
I'm just tuning the ffmpeg step to see if I can speed things up here
Adding -hwaccel cuda
to the ffpmeg header made this step almost instant
Hi @chrisbward did the issue eventually resolve? In my case, I first got the 'Boolean value of Tensor with more than one value is ambiguous' error originally but it resolved after I used a mono audio.
However, a new error ensues:
File "C:\Users\musta\audio2photoreal\model\diffusion.py", line 388, in forward
cond_tokens = torch.where(
RuntimeError: The size of tensor a (11598) must match the size of tensor b (1998) at non-singleton dimension 1
I wonder if anyone has an idea how to fix this. As it suggests, it has to to do with unmatched tensors due to the torch.where(..)
condition in the diffusion.py
file.
Thank you all for such active help on these issues!
Hi @MustaphaU , it seems that results from the auto-generated mask size not matching that of the audio conditioning tensor. I am not too sure why that might be the case (since it would require more downstream information), but could you please try the above fix in the PR to see if it solves it? I wonder if it is because the audio is somehow getting corrupted downstream...
Closing this for now due to inactivity. But please feel to reopen if there's more issues related. Thanks!