DeepFilterNet icon indicating copy to clipboard operation
DeepFilterNet copied to clipboard

Run tract model in half float

Open Rikorose opened this issue 3 years ago • 2 comments

running

cargo run -p deep_filter --features=tract,bin,wav-utils,transforms --bin deep-filter -- --half-floats assets/noisy_snr0.wav -o out -vv -D

results in the following stack trace:

   Compiling deep_filter v0.4.1-pre (/home/hendrik/projects/DeepFilterNet/libDF)
    Finished dev [optimized + debuginfo] target(s) in 20.20s
     Running `target/debug/deep-filter --half-floats assets/noisy_snr0.wav -o out -vv -D`
[2022-12-13T10:09:59Z INFO  df::tract] Init encoder with delay: 0
[2022-12-13T10:09:59Z INFO  tract_linalg::x86_64_fma] mmm_f32, sigmoid_f32, tanh_f32: x86_64/fma activated
[2022-12-13T10:09:59Z INFO  tract_linalg::x86_64_fma] mmm_i8_i8 and mmm_i8_i32: x86_64/avx2 activated
[2022-12-13T10:09:59Z INFO  df::tract] Init ERB decoder with delay: 0
[2022-12-13T10:09:59Z INFO  df::tract] Init DF decoder with delay: 0
[2022-12-13T10:09:59Z INFO  df::tract] Running with model type deepfilternet3 lookahead 2
Error: Evaluating #3 "/erb_conv0/1/Conv.pad" Pad

Caused by:
    Tensor datum type error: tensor is F32, accessed as F16

Stack backtrace:
   0: tract_data::tensor::Tensor::check_for_access
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-data-0.18.5/src/tensor.rs:748:13
   1: tract_data::tensor::Tensor::to_scalar
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-data-0.18.5/src/tensor.rs:848:9
   2: tract_core::ops::array::pad::Pad::eval_t
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/ops/array/pad.rs:34:38
      <tract_core::ops::array::pad::Pad as tract_core::ops::EvalOp>::eval
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/ops/array/pad.rs:109:18
   3: tract_core::plan::eval
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/plan.rs:491:17
   4: core::ops::function::FnMut::call_mut
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:164:5
      tract_core::plan::SimpleState<F,O,M,P>::exec_plan_with_eval
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/plan.rs:264:26
   5: tract_core::plan::SimpleState<F,O,M,P>::run_plan_with_eval
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/plan.rs:199:9
      tract_core::plan::SimpleState<F,O,M,P>::run
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/plan.rs:177:9
   6: df::tract::DfTract::process
             at ./libDF/src/tract.rs:401:17
   7: deep_filter::main
             at ./libDF/src/bin/enhance_wav.rs:139:13
   8: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
      std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/sys_common/backtrace.rs:122:18
   9: std::rt::lang_start::{{closure}}
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/rt.rs:166:18
  10: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:283:13
      std::panicking::try::do_call
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492:40
      std::panicking::try
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:456:19
      std::panic::catch_unwind
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panic.rs:137:14
      std::rt::lang_start_internal::{{closure}}
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/rt.rs:148:48
      std::panicking::try::do_call
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492:40
      std::panicking::try
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:456:19
      std::panic::catch_unwind
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panic.rs:137:14
      std::rt::lang_start_internal
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/rt.rs:148:20
  11: main
  12: __libc_start_call_main
  13: __libc_start_main_alias_2
  14: _start

Rikorose avatar Dec 13 '22 10:12 Rikorose

Hi @Rikorose ,

I went ahead and tried out your branch out of curiosity on my a55 (on a rockchip sbc which is the only fp16 compatible CPU I have) after seeing the comment in the tract repo, and my guess wasn’t too far off. A large chunk of time (20% according to gprof) was spent in binary16::convert::f32_to_f16_fallback.

f16_to_f32 was lower at 7%, but still a sizeable chunk.

VariantXYZ avatar Dec 15 '22 03:12 VariantXYZ

Some additional information (maybe relevant to tract):

The inner_loop generic in core/src/ops/cnn/conv/depth_wise.rs is doing a back and forth conversion like every single line.

Some other ops may be like this too, but this seems to be the main offender.

Edit: tried adding the half crate and specifying std and use-intrinsics manually but it still seems to be doing the conversions… I must be missing something here to force it to use fp16 on a55.

VariantXYZ avatar Dec 15 '22 19:12 VariantXYZ