AniPortrait
AniPortrait copied to clipboard
Audio driven推理出来的视频为噪声
Audio driven推理官方的demo,float16精度下,生成视频数据为nan;将精度改为float32后,生成视频数据不为nan,但是视频是噪声。能生成landmark视频。
由于没有具体的log,因此无法判断造成该bug的原因。您可以从以下几个方面尝试排查: 1、确保依赖环境配置正确 2、预训练模型是否完整下载 3、是否正确加载所有我们的预训练权重 https://huggingface.co/ZJYang/AniPortrait/tree/main 4、使用我们的测试音频./configs/inference/audio/lyl.wav,判断您的测试音频格式是否异常
我也遇到了同样的问题,推理结果为噪声。
https://github.com/Zejun-Yang/AniPortrait/assets/41713524/3612163f-e866-44de-bab3-650782dd5bb5
log如下:
python -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 512
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1711676121.449497 2965837 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:77) display != EGL_NO_DISPLAYeglGetDisplay() returned error 0x3000
W0000 00:00:1711676121.571123 2965837 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
I0000 00:00:1711676122.267680 2965837 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:77) display != EGL_NO_DISPLAYeglGetDisplay() returned error 0x3000
pose video has 1794 frames, with 30 fps
/home/wyn/dev/talkingface/diffusionbased/AniPortrait/src/pipelines/pipeline_pose2vid_long.py:408: FutureWarning: Accessing config attribute `in_channels` directly via 'UNet3DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'UNet3DConditionModel's config object instead, e.g. 'unet.config.in_channels'.
num_channels_latents = self.denoising_unet.in_channels
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [2:24:08<00:00, 345.95s/it]
100%|████████████████████████████████████████████████████████████████████████████████████████████| 1794/1794 [02:20<00:00, 12.73it/s]
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './configs/inference/pose_videos/solo_pose.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.20.100
Duration: 00:00:59.86, start: 0.000000, bitrate: 1024 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 512x512, 887 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : SoundHandler
Output #0, adts, to 'audio_from_video.aac':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.29.100
Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : SoundHandler
Stream mapping:
Stream #0:1 -> #0:0 (copy)
Press [q] to stop, [?] for help
size= 960kB time=00:00:59.81 bitrate= 131.5kbits/s speed= 622x
video:0kB audio:943kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.869552%
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'output/20240329/0935--seed_42-512x512/solo_solo_pose_512x512_3_0935_noaudio.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf60.3.100
Duration: 00:00:59.80, start: 0.000000, bitrate: 1032 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1544x516, 1029 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
Metadata:
handler_name : VideoHandler
[aac @ 0x55cd36176700] Estimating duration from bitrate, this may be inaccurate
Input #1, aac, from 'audio_from_video.aac':
Duration: 00:01:04.14, bitrate: 122 kb/s
Stream #1:0: Audio: aac (LC), 44100 Hz, stereo, fltp, 122 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Stream #1:0 -> #0:1 (aac (native) -> aac (native))
Press [q] to stop, [?] for help
Output #0, mp4, to 'output/20240329/0935--seed_42-512x512/solo_solo_pose_512x512_3_0935.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.29.100
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1544x516, q=2-31, 1029 kb/s, 30 fps, 30 tbr, 15360 tbn, 15360 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s
Metadata:
encoder : Lavc58.54.100 aac
frame= 1794 fps=507 q=-1.0 Lsize= 8521kB time=00:00:59.86 bitrate=1166.1kbits/s speed=16.9x
video:7514kB audio:943kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.761882%
[aac @ 0x55cd3619dd80] Qavg: 547.014
我也遇到了同样的问题,推理结果为噪声。
solo_solo_pose_512x512_3_0935.mp4 log如下:
python -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 512 Some weights of the model checkpoint were not used when initializing UNet2DConditionModel: ['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias'] WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1711676121.449497 2965837 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:77) display != EGL_NO_DISPLAYeglGetDisplay() returned error 0x3000 W0000 00:00:1711676121.571123 2965837 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default. INFO: Created TensorFlow Lite XNNPACK delegate for CPU. I0000 00:00:1711676122.267680 2965837 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:77) display != EGL_NO_DISPLAYeglGetDisplay() returned error 0x3000 pose video has 1794 frames, with 30 fps /home/wyn/dev/talkingface/diffusionbased/AniPortrait/src/pipelines/pipeline_pose2vid_long.py:408: FutureWarning: Accessing config attribute `in_channels` directly via 'UNet3DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'UNet3DConditionModel's config object instead, e.g. 'unet.config.in_channels'. num_channels_latents = self.denoising_unet.in_channels 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [2:24:08<00:00, 345.95s/it] 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1794/1794 [02:20<00:00, 12.73it/s] ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1) configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared libavutil 56. 31.100 / 56. 31.100 libavcodec 58. 54.100 / 58. 54.100 libavformat 58. 29.100 / 58. 29.100 libavdevice 58. 8.100 / 58. 8.100 libavfilter 7. 57.100 / 7. 57.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 5.100 / 5. 5.100 libswresample 3. 5.100 / 3. 5.100 libpostproc 55. 5.100 / 55. 5.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './configs/inference/pose_videos/solo_pose.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.20.100 Duration: 00:00:59.86, start: 0.000000, bitrate: 1024 kb/s Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 512x512, 887 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default) Metadata: handler_name : SoundHandler Output #0, adts, to 'audio_from_video.aac': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.29.100 Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default) Metadata: handler_name : SoundHandler Stream mapping: Stream #0:1 -> #0:0 (copy) Press [q] to stop, [?] for help size= 960kB time=00:00:59.81 bitrate= 131.5kbits/s speed= 622x video:0kB audio:943kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.869552% ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1) configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared libavutil 56. 31.100 / 56. 31.100 libavcodec 58. 54.100 / 58. 54.100 libavformat 58. 29.100 / 58. 29.100 libavdevice 58. 8.100 / 58. 8.100 libavfilter 7. 57.100 / 7. 57.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 5.100 / 5. 5.100 libswresample 3. 5.100 / 3. 5.100 libpostproc 55. 5.100 / 55. 5.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'output/20240329/0935--seed_42-512x512/solo_solo_pose_512x512_3_0935_noaudio.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf60.3.100 Duration: 00:00:59.80, start: 0.000000, bitrate: 1032 kb/s Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1544x516, 1029 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default) Metadata: handler_name : VideoHandler [aac @ 0x55cd36176700] Estimating duration from bitrate, this may be inaccurate Input #1, aac, from 'audio_from_video.aac': Duration: 00:01:04.14, bitrate: 122 kb/s Stream #1:0: Audio: aac (LC), 44100 Hz, stereo, fltp, 122 kb/s Stream mapping: Stream #0:0 -> #0:0 (copy) Stream #1:0 -> #0:1 (aac (native) -> aac (native)) Press [q] to stop, [?] for help Output #0, mp4, to 'output/20240329/0935--seed_42-512x512/solo_solo_pose_512x512_3_0935.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.29.100 Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1544x516, q=2-31, 1029 kb/s, 30 fps, 30 tbr, 15360 tbn, 15360 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s Metadata: encoder : Lavc58.54.100 aac frame= 1794 fps=507 q=-1.0 Lsize= 8521kB time=00:00:59.86 bitrate=1166.1kbits/s speed=16.9x video:7514kB audio:943kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.761882% [aac @ 0x55cd3619dd80] Qavg: 547.014
https://github.com/Zejun-Yang/AniPortrait/assets/21038147/0c40144b-872c-4543-bab0-d725c477b428
我们复现了您反馈的问题。 触发条件没有正确加载我们预训练权重,导致模型参数跟我们的任务不匹配。 您可以重新下载我们的预训练模型,并确保其位于正确路径下,详情参考README.md。 https://huggingface.co/ZJYang/AniPortrait/tree/main
solo_solo_pose1_512x512_3_1438_noaudio.mp4 我们复现了您反馈的问题。 触发条件没有正确加载我们预训练权重,导致模型参数跟我们的任务不匹配。 您可以重新下载我们的预训练模型,并确保其位于正确路径下,详情参考README.md。 https://huggingface.co/ZJYang/AniPortrait/tree/main
感谢您的回复,请问复现的这段视频是哪部分模型参数没有加载正确呢?我参考了本repo中的预训练模型目录,我的目录如下,看上去和您给的readme一致:
solo_solo_pose1_512x512_3_1438_noaudio.mp4 我们复现了您反馈的问题。 触发条件没有正确加载我们预训练权重,导致模型参数跟我们的任务不匹配。 您可以重新下载我们的预训练模型,并确保其位于正确路径下,详情参考README.md。 https://huggingface.co/ZJYang/AniPortrait/tree/main
感谢您的回复,请问复现的这段视频是哪部分模型参数没有加载正确呢?我参考了本repo中的预训练模型目录,我的目录如下,看上去和您给的readme一致:
可以检查模型文件大小是否正常,当git lfs clone过程存在异常时,可能出现文件下载不完整的问题。
由于没有具体的log,因此无法判断造成该bug的原因。您可以从以下几个方面尝试排查: 1、确保依赖环境配置正确 2、预训练模型是否完整下载 3、是否正确加载所有我们的预训练权重 https://huggingface.co/ZJYang/AniPortrait/tree/main 4、使用我们的测试音频./configs/inference/audio/lyl.wav,判断您的测试音频格式是否异常
推理出来的视频为空,video数据为nan。之前也有这样的问题,按照readme重新下载了权重和模型,还是出现这个问题。如下所示
https://github.com/Zejun-Yang/AniPortrait/assets/58095417/53d0e0ce-7b7a-4469-a0ef-398c3b19e52f
在推理的过程中,第二阶段推理生成视频很快。我将video数据打印出来,如下是log信息 Some weights of the model checkpoint at ./pretrained_model/wav2vec2-base-960h were not used when initializing Wav2Vec2Model: ['lm_head.weight', 'lm_head.bias']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at ./pretrained_model/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight']
/mnt/lpai-dione/ssai/cvg/team/envs/aniportrait/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1711699055.699109 1865454 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:77) display != EGL_NO_DISPLAYeglGetDisplay() returned error 0x300c
W0000 00:00:1711699055.719278 1865454 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
I0000 00:00:1711699055.761454 1865454 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:77) display != EGL_NO_DISPLAYeglGetDisplay() returned error 0x300c
pose video has 233 frames, with 30 fps
/mnt/lpai-dione/ssai/cvg/team/didonglin/lhz/AniPortrait/src/pipelines/pipeline_pose2vid_long.py:408: FutureWarning: Accessing config attribute
in_channels
directly via 'UNet3DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'UNet3DConditionModel's config object instead, e.g. 'unet.config.in_channels'. num_channels_latents = self.denoising_unet.in_channels 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [04:15<00:00, 10.21s/it] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:05<00:00, 11.80it/s](这里推理速度很快,只有5秒) tensor([[[[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]]]]) ffmpeg version 4.3.6-0+deb11u1 Copyright (c) 2000-2023 the FFmpeg developers built with gcc 10 (Debian 10.2.1-6) configuration: --prefix=/usr --extra-version=0+deb11u1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared libavutil 56. 51.100 / 56. 51.100 libavcodec 58. 91.100 / 58. 91.100 libavformat 58. 45.100 / 58. 45.100 libavdevice 58. 10.100 / 58. 10.100 libavfilter 7. 85.100 / 7. 85.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 7.100 / 5. 7.100 libswresample 3. 7.100 / 3. 7.100 libpostproc 55. 7.100 / 55. 7.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'output/20240329/1557--seed_42-512x512/lyl_lyl_512x512_3_1557_noaudio.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf60.3.100 Duration: 00:00:02.13, start: 0.000000, bitrate: 518 kb/s Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1544x516, 512 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default) Metadata: handler_name : VideoHandler Guessed Channel Layout for Input Stream #1.0 : stereo Input #1, wav, from './configs/inference/audio/lyl.wav': Metadata: encoder : Lavf58.20.100 Duration: 00:00:07.74, bitrate: 1411 kb/s Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s Stream mapping: Stream #0:0 -> #0:0 (copy) Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native)) Press [q] to stop, [?] for help Output #0, mp4, to 'output/20240329/1557--seed_42-512x512/lyl_lyl_512x512_3_1557.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.45.100 Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1544x516, q=2-31, 512 kb/s, 30 fps, 30 tbr, 15360 tbn, 15360 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s Metadata: encoder : Lavc58.91.100 aac frame= 64 fps=0.0 q=-1.0 Lsize= 259kB time=00:00:07.75 bitrate= 274.1kbits/s speed=82.5x
video:133kB audio:122kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.786617% [aac @ 0x55eb5959fa40] Qavg: 291.960
#40 参考该issue,可能为环境配置异常。