tract
tract copied to clipboard
Half-mode LSTM MatMuls failing to optimize with No matrix multiplier for F16xF32 to F16
Hi again!
Tract's performance has been fantastic so far, and I've had a lot of good luck playing around with F16 on different models on hardware that supports it, but I happened to run into an issue when experimenting with an onnx-converted version of this model.
Model translated!
Error: running pass codegen
Caused by:
0: codegen node #46 "MatMul;functional_3_lstm_4_lstm_cell_4_StatefulPartitionedCall_MatMul1_Gemm__13.ab.split-over-1.384..512" MatMulUnary
1: No matrix multiplier for F16xF32 to F16
The model isn't anything particularly secret (I just used tf2onnx to quickly convert and test it): 1.zip
use std::error::Error;
use std::io::Cursor;
use tract_core::prelude::*;
use tract_onnx::prelude::*;
use tract_core::model::translator::Translate;
fn main() -> Result<(), Box<dyn Error>> {
let tract = tract_onnx::onnx();
let mut c = Cursor::new(include_bytes!("./1.onnx") as &[u8]);
let mut m = tract.model_for_read(&mut c)?;
m.analyse(true)?;
let mut m = m.into_typed()?;
m.declutter()?;
m = tract_core::half::HalfTranslator.translate_model(&m)?;
println!("Model translated!");
m = m.into_optimized()?; // Will fail here
println!("Model optimized!");
let _ = m.into_runnable()?;
println!("Model runnable!");
Ok(())
}
Could you give a shot at the current main branch ? I think this is solved.