AniPortrait icon indicating copy to clipboard operation
AniPortrait copied to clipboard

Audio driven推理出来的视频为噪声

Open HuaizeLiu opened this issue 10 months ago • 7 comments

Audio driven推理官方的demo,float16精度下,生成视频数据为nan;将精度改为float32后,生成视频数据不为nan,但是视频是噪声。能生成landmark视频。 噪声图片

HuaizeLiu avatar Mar 28 '24 16:03 HuaizeLiu

由于没有具体的log,因此无法判断造成该bug的原因。您可以从以下几个方面尝试排查: 1、确保依赖环境配置正确 2、预训练模型是否完整下载 3、是否正确加载所有我们的预训练权重 https://huggingface.co/ZJYang/AniPortrait/tree/main 4、使用我们的测试音频./configs/inference/audio/lyl.wav,判断您的测试音频格式是否异常

Zejun-Yang avatar Mar 29 '24 02:03 Zejun-Yang

我也遇到了同样的问题,推理结果为噪声。

https://github.com/Zejun-Yang/AniPortrait/assets/41713524/3612163f-e866-44de-bab3-650782dd5bb5

log如下:

python -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 512
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
 ['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1711676121.449497 2965837 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:77) display != EGL_NO_DISPLAYeglGetDisplay() returned error 0x3000
W0000 00:00:1711676121.571123 2965837 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
I0000 00:00:1711676122.267680 2965837 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:77) display != EGL_NO_DISPLAYeglGetDisplay() returned error 0x3000
pose video has 1794 frames, with 30 fps
/home/wyn/dev/talkingface/diffusionbased/AniPortrait/src/pipelines/pipeline_pose2vid_long.py:408: FutureWarning: Accessing config attribute `in_channels` directly via 'UNet3DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'UNet3DConditionModel's config object instead, e.g. 'unet.config.in_channels'.
  num_channels_latents = self.denoising_unet.in_channels
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [2:24:08<00:00, 345.95s/it]
100%|████████████████████████████████████████████████████████████████████████████████████████████| 1794/1794 [02:20<00:00, 12.73it/s]
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './configs/inference/pose_videos/solo_pose.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.20.100
  Duration: 00:00:59.86, start: 0.000000, bitrate: 1024 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 512x512, 887 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
Output #0, adts, to 'audio_from_video.aac':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.29.100
    Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
Stream mapping:
  Stream #0:1 -> #0:0 (copy)
Press [q] to stop, [?] for help
size=     960kB time=00:00:59.81 bitrate= 131.5kbits/s speed= 622x
video:0kB audio:943kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.869552%
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'output/20240329/0935--seed_42-512x512/solo_solo_pose_512x512_3_0935_noaudio.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf60.3.100
  Duration: 00:00:59.80, start: 0.000000, bitrate: 1032 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1544x516, 1029 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
    Metadata:
      handler_name    : VideoHandler
[aac @ 0x55cd36176700] Estimating duration from bitrate, this may be inaccurate
Input #1, aac, from 'audio_from_video.aac':
  Duration: 00:01:04.14, bitrate: 122 kb/s
    Stream #1:0: Audio: aac (LC), 44100 Hz, stereo, fltp, 122 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #1:0 -> #0:1 (aac (native) -> aac (native))
Press [q] to stop, [?] for help
Output #0, mp4, to 'output/20240329/0935--seed_42-512x512/solo_solo_pose_512x512_3_0935.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.29.100
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1544x516, q=2-31, 1029 kb/s, 30 fps, 30 tbr, 15360 tbn, 15360 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc58.54.100 aac
frame= 1794 fps=507 q=-1.0 Lsize=    8521kB time=00:00:59.86 bitrate=1166.1kbits/s speed=16.9x
video:7514kB audio:943kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.761882%
[aac @ 0x55cd3619dd80] Qavg: 547.014

fredkingdom avatar Mar 29 '24 04:03 fredkingdom

我也遇到了同样的问题,推理结果为噪声。

solo_solo_pose_512x512_3_0935.mp4 log如下:

python -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 512
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
 ['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1711676121.449497 2965837 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:77) display != EGL_NO_DISPLAYeglGetDisplay() returned error 0x3000
W0000 00:00:1711676121.571123 2965837 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
I0000 00:00:1711676122.267680 2965837 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:77) display != EGL_NO_DISPLAYeglGetDisplay() returned error 0x3000
pose video has 1794 frames, with 30 fps
/home/wyn/dev/talkingface/diffusionbased/AniPortrait/src/pipelines/pipeline_pose2vid_long.py:408: FutureWarning: Accessing config attribute `in_channels` directly via 'UNet3DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'UNet3DConditionModel's config object instead, e.g. 'unet.config.in_channels'.
  num_channels_latents = self.denoising_unet.in_channels
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [2:24:08<00:00, 345.95s/it]
100%|████████████████████████████████████████████████████████████████████████████████████████████| 1794/1794 [02:20<00:00, 12.73it/s]
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './configs/inference/pose_videos/solo_pose.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.20.100
  Duration: 00:00:59.86, start: 0.000000, bitrate: 1024 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 512x512, 887 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
Output #0, adts, to 'audio_from_video.aac':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.29.100
    Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
Stream mapping:
  Stream #0:1 -> #0:0 (copy)
Press [q] to stop, [?] for help
size=     960kB time=00:00:59.81 bitrate= 131.5kbits/s speed= 622x
video:0kB audio:943kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.869552%
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'output/20240329/0935--seed_42-512x512/solo_solo_pose_512x512_3_0935_noaudio.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf60.3.100
  Duration: 00:00:59.80, start: 0.000000, bitrate: 1032 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1544x516, 1029 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
    Metadata:
      handler_name    : VideoHandler
[aac @ 0x55cd36176700] Estimating duration from bitrate, this may be inaccurate
Input #1, aac, from 'audio_from_video.aac':
  Duration: 00:01:04.14, bitrate: 122 kb/s
    Stream #1:0: Audio: aac (LC), 44100 Hz, stereo, fltp, 122 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #1:0 -> #0:1 (aac (native) -> aac (native))
Press [q] to stop, [?] for help
Output #0, mp4, to 'output/20240329/0935--seed_42-512x512/solo_solo_pose_512x512_3_0935.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.29.100
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1544x516, q=2-31, 1029 kb/s, 30 fps, 30 tbr, 15360 tbn, 15360 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc58.54.100 aac
frame= 1794 fps=507 q=-1.0 Lsize=    8521kB time=00:00:59.86 bitrate=1166.1kbits/s speed=16.9x
video:7514kB audio:943kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.761882%
[aac @ 0x55cd3619dd80] Qavg: 547.014

https://github.com/Zejun-Yang/AniPortrait/assets/21038147/0c40144b-872c-4543-bab0-d725c477b428

我们复现了您反馈的问题。 触发条件没有正确加载我们预训练权重,导致模型参数跟我们的任务不匹配。 您可以重新下载我们的预训练模型,并确保其位于正确路径下,详情参考README.md。 https://huggingface.co/ZJYang/AniPortrait/tree/main

Zejun-Yang avatar Mar 29 '24 06:03 Zejun-Yang

solo_solo_pose1_512x512_3_1438_noaudio.mp4 我们复现了您反馈的问题。 触发条件没有正确加载我们预训练权重,导致模型参数跟我们的任务不匹配。 您可以重新下载我们的预训练模型,并确保其位于正确路径下,详情参考README.md。 https://huggingface.co/ZJYang/AniPortrait/tree/main

感谢您的回复,请问复现的这段视频是哪部分模型参数没有加载正确呢?我参考了本repo中的预训练模型目录,我的目录如下,看上去和您给的readme一致: 微信截图_20240329151007

fredkingdom avatar Mar 29 '24 07:03 fredkingdom

solo_solo_pose1_512x512_3_1438_noaudio.mp4 我们复现了您反馈的问题。 触发条件没有正确加载我们预训练权重,导致模型参数跟我们的任务不匹配。 您可以重新下载我们的预训练模型,并确保其位于正确路径下,详情参考README.md。 https://huggingface.co/ZJYang/AniPortrait/tree/main

感谢您的回复,请问复现的这段视频是哪部分模型参数没有加载正确呢?我参考了本repo中的预训练模型目录,我的目录如下,看上去和您给的readme一致: 微信截图_20240329151007

可以检查模型文件大小是否正常,当git lfs clone过程存在异常时,可能出现文件下载不完整的问题。

Zejun-Yang avatar Mar 29 '24 08:03 Zejun-Yang

由于没有具体的log,因此无法判断造成该bug的原因。您可以从以下几个方面尝试排查: 1、确保依赖环境配置正确 2、预训练模型是否完整下载 3、是否正确加载所有我们的预训练权重 https://huggingface.co/ZJYang/AniPortrait/tree/main 4、使用我们的测试音频./configs/inference/audio/lyl.wav,判断您的测试音频格式是否异常

推理出来的视频为空,video数据为nan。之前也有这样的问题,按照readme重新下载了权重和模型,还是出现这个问题。如下所示

https://github.com/Zejun-Yang/AniPortrait/assets/58095417/53d0e0ce-7b7a-4469-a0ef-398c3b19e52f

在推理的过程中,第二阶段推理生成视频很快。我将video数据打印出来,如下是log信息 Some weights of the model checkpoint at ./pretrained_model/wav2vec2-base-960h were not used when initializing Wav2Vec2Model: ['lm_head.weight', 'lm_head.bias']

  • This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of Wav2Vec2Model were not initialized from the model checkpoint at ./pretrained_model/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of the model checkpoint were not used when initializing UNet2DConditionModel: ['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight'] /mnt/lpai-dione/ssai/cvg/team/envs/aniportrait/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1711699055.699109 1865454 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:77) display != EGL_NO_DISPLAYeglGetDisplay() returned error 0x300c W0000 00:00:1711699055.719278 1865454 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default. INFO: Created TensorFlow Lite XNNPACK delegate for CPU. I0000 00:00:1711699055.761454 1865454 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:77) display != EGL_NO_DISPLAYeglGetDisplay() returned error 0x300c pose video has 233 frames, with 30 fps /mnt/lpai-dione/ssai/cvg/team/didonglin/lhz/AniPortrait/src/pipelines/pipeline_pose2vid_long.py:408: FutureWarning: Accessing config attribute in_channels directly via 'UNet3DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'UNet3DConditionModel's config object instead, e.g. 'unet.config.in_channels'. num_channels_latents = self.denoising_unet.in_channels 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [04:15<00:00, 10.21s/it] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:05<00:00, 11.80it/s](这里推理速度很快,只有5秒) tensor([[[[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]]]]) ffmpeg version 4.3.6-0+deb11u1 Copyright (c) 2000-2023 the FFmpeg developers built with gcc 10 (Debian 10.2.1-6) configuration: --prefix=/usr --extra-version=0+deb11u1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared libavutil 56. 51.100 / 56. 51.100 libavcodec 58. 91.100 / 58. 91.100 libavformat 58. 45.100 / 58. 45.100 libavdevice 58. 10.100 / 58. 10.100 libavfilter 7. 85.100 / 7. 85.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 7.100 / 5. 7.100 libswresample 3. 7.100 / 3. 7.100 libpostproc 55. 7.100 / 55. 7.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'output/20240329/1557--seed_42-512x512/lyl_lyl_512x512_3_1557_noaudio.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf60.3.100 Duration: 00:00:02.13, start: 0.000000, bitrate: 518 kb/s Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1544x516, 512 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default) Metadata: handler_name : VideoHandler Guessed Channel Layout for Input Stream #1.0 : stereo Input #1, wav, from './configs/inference/audio/lyl.wav': Metadata: encoder : Lavf58.20.100 Duration: 00:00:07.74, bitrate: 1411 kb/s Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s Stream mapping: Stream #0:0 -> #0:0 (copy) Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native)) Press [q] to stop, [?] for help Output #0, mp4, to 'output/20240329/1557--seed_42-512x512/lyl_lyl_512x512_3_1557.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.45.100 Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1544x516, q=2-31, 512 kb/s, 30 fps, 30 tbr, 15360 tbn, 15360 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s Metadata: encoder : Lavc58.91.100 aac frame= 64 fps=0.0 q=-1.0 Lsize= 259kB time=00:00:07.75 bitrate= 274.1kbits/s speed=82.5x
    video:133kB audio:122kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.786617% [aac @ 0x55eb5959fa40] Qavg: 291.960

HuaizeLiu avatar Mar 29 '24 08:03 HuaizeLiu

#40 参考该issue,可能为环境配置异常。

Zejun-Yang avatar Apr 01 '24 08:04 Zejun-Yang