Run tract model in half float
running
cargo run -p deep_filter --features=tract,bin,wav-utils,transforms --bin deep-filter -- --half-floats assets/noisy_snr0.wav -o out -vv -D
results in the following stack trace:
Compiling deep_filter v0.4.1-pre (/home/hendrik/projects/DeepFilterNet/libDF)
Finished dev [optimized + debuginfo] target(s) in 20.20s
Running `target/debug/deep-filter --half-floats assets/noisy_snr0.wav -o out -vv -D`
[2022-12-13T10:09:59Z INFO df::tract] Init encoder with delay: 0
[2022-12-13T10:09:59Z INFO tract_linalg::x86_64_fma] mmm_f32, sigmoid_f32, tanh_f32: x86_64/fma activated
[2022-12-13T10:09:59Z INFO tract_linalg::x86_64_fma] mmm_i8_i8 and mmm_i8_i32: x86_64/avx2 activated
[2022-12-13T10:09:59Z INFO df::tract] Init ERB decoder with delay: 0
[2022-12-13T10:09:59Z INFO df::tract] Init DF decoder with delay: 0
[2022-12-13T10:09:59Z INFO df::tract] Running with model type deepfilternet3 lookahead 2
Error: Evaluating #3 "/erb_conv0/1/Conv.pad" Pad
Caused by:
Tensor datum type error: tensor is F32, accessed as F16
Stack backtrace:
0: tract_data::tensor::Tensor::check_for_access
at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-data-0.18.5/src/tensor.rs:748:13
1: tract_data::tensor::Tensor::to_scalar
at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-data-0.18.5/src/tensor.rs:848:9
2: tract_core::ops::array::pad::Pad::eval_t
at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/ops/array/pad.rs:34:38
<tract_core::ops::array::pad::Pad as tract_core::ops::EvalOp>::eval
at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/ops/array/pad.rs:109:18
3: tract_core::plan::eval
at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/plan.rs:491:17
4: core::ops::function::FnMut::call_mut
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:164:5
tract_core::plan::SimpleState<F,O,M,P>::exec_plan_with_eval
at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/plan.rs:264:26
5: tract_core::plan::SimpleState<F,O,M,P>::run_plan_with_eval
at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/plan.rs:199:9
tract_core::plan::SimpleState<F,O,M,P>::run
at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/plan.rs:177:9
6: df::tract::DfTract::process
at ./libDF/src/tract.rs:401:17
7: deep_filter::main
at ./libDF/src/bin/enhance_wav.rs:139:13
8: core::ops::function::FnOnce::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
std::sys_common::backtrace::__rust_begin_short_backtrace
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/sys_common/backtrace.rs:122:18
9: std::rt::lang_start::{{closure}}
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/rt.rs:166:18
10: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:283:13
std::panicking::try::do_call
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492:40
std::panicking::try
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:456:19
std::panic::catch_unwind
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panic.rs:137:14
std::rt::lang_start_internal::{{closure}}
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/rt.rs:148:48
std::panicking::try::do_call
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492:40
std::panicking::try
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:456:19
std::panic::catch_unwind
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panic.rs:137:14
std::rt::lang_start_internal
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/rt.rs:148:20
11: main
12: __libc_start_call_main
13: __libc_start_main_alias_2
14: _start
Hi @Rikorose ,
I went ahead and tried out your branch out of curiosity on my a55 (on a rockchip sbc which is the only fp16 compatible CPU I have) after seeing the comment in the tract repo, and my guess wasn’t too far off. A large chunk of time (20% according to gprof) was spent in binary16::convert::f32_to_f16_fallback.
f16_to_f32 was lower at 7%, but still a sizeable chunk.
Some additional information (maybe relevant to tract):
The inner_loop generic in core/src/ops/cnn/conv/depth_wise.rs is doing a back and forth conversion like every single line.
Some other ops may be like this too, but this seems to be the main offender.
Edit: tried adding the half crate and specifying std and use-intrinsics manually but it still seems to be doing the conversions… I must be missing something here to force it to use fp16 on a55.