tract
tract copied to clipboard
Support "Resize" operator
Hi there! Thank you for your really needed and highly appreciated project!
Coming from the computer vision corner of DL, I utilize the Resize operator alot. It's used in many image segmentation networks such as the popular U-Net to upsample layers (think "deconvolution").
Tract has no support for this yet so I wanted to humbly ask if it would be possible to implement it in the foreseeable future. Rust is not really my strength (~2 hour experience so far...) but I would be willing to help if I can.
Edit: I've just seen that my networks are using v11 of the operator set. This is the exact call:
onnx::Resize[coordinate_transformation_mode="align_corners", mode="linear", nearest_mode="floor"]
I had a look... I have let onnx support lag behing a bit, so I need first to handle the changes in operators in operator set 11 (and 12). I can't give you a timeline, but I want to do this.
Thank you for the quick answer. I will have a look if I can help adding support for op set 11.
Not sure if you're keeping track, I have added support for opsets 11 and 12 at framework level, and fixed some of the operators that had breaking change. I also collected some low hanging fruits in opset 11 and 12 (like round and modulo).
I suspect there are still a few more lying around, and then I'll give resize a shot. All of this is a low priority item for my company, so it's nights and weekend stuff, but we'll get there.
Yes, I've seen your commit, thank you for the notice. I also tried to understand the rust implementation details of tract and come up with a naive solution to Resize that would be sufficient for my use case, but I would have to start with something far more simple, I'm afraid.
Anyways, looking very forward to future updates. Thank you very much for every step.
Hey, @sonovice, I had a look at Resize and... wow. It is really an ugly beast. Beyond the complexity optimizing the algorithms in arbitrary dimension and dealing correctly with floor and round, there are a few concerns which are introduce by tract itself.
So let me ask you a few questions to see if I can come with something that will work relatively efficiently at least in "most cases".
To be honest, I've always been a bit cautious when it comes to add "specialized" operators and this one has a distinct "image" flavor. My experience is that specialized operators are usually not "trained", and serve mostly has a pre-processing step for the rest of the neural network that follows and uses their output. So it's mostly a convenience to embed the preprocessing with the neural net in one single big bundle which may be very desirable for integration. On the other hand, it pushes me to implement more or less complicated operators. It's usually not possible to just integrate a third party library because of the range of parameters allowed by ONNX or TF and the natural generalisation that comes with thinking neural network ("yeah, let's make the spec say arbitrary dimensions, even if this will never be used above rank 2, that's an implementor problem"). So my reflex so far has been to recommend people to just do the pre-processing outside of the net with a library they know well, leaving tract to focus on pure neural net stuff instead.
More recently, a bit pushed by work stuff, I have started to consider specialized operators too, primarily for signal and voice. This would require modularizing the operator set, so if I'm going this way, I can also start to consider image stuff.
0/ Anyway question 0: can you easily extract the resize from your net and feed the network with the resize output ? tract let you override a net input (and output).
On top of that "generic" concern about specialized operator, I actually gave a closer look to Resize this morning, and there is another concern. At this point tract can only optimize a network when all tensor dimensions are predictable without knowing the input value (only its shape, which has to be a constant). I have a bit of a plan to generalize and lower these constraints, but it's going to take a while. See issue #313 . I will be asking these two questions because this is my first datapoint in the area : I am trying to assess if I can make assumptions about how people will want to use Resize. tract makes such assumptions for many operators, sometimes I get them wrong and need to relax them for one specific network to become optimisable.
1/ Is the input shape a constant ? (meaning: you can "optimize" your network for one given input size, and then rerun it repeatedly with different images) 2/ Are you using the scales or sizes input ? (sizes is easier) 3/ Is this parameter a constant ?
Depending on the various combination of these three, we can or not predict the resized shape once and for all and have a relatively fast network, or have to wait for #313.
Hey @kali, thanks for your thoughts. Unfortunately, U-Net uses Resize to upscale the output of convolutional filters in the network. So in short:
- No, resizing cannot be done (solely) as part of an image preprocessing step.
- While U-Net in particular could utilize arbitrary input shapes, having a fixed input image size would be fine for me at this point.
- Sorry, it's
scalein my case... - ... but at least it's constant (
2).
Thanks, and this combination sounds OK. I will go for this as an MVP and relax some of the other constraints later on.
Hey, would you mind giving a shot at the resize branch ? I think you use case should be covered.
I must say the Resize spec and its tests are not completely clear and consistent, so I'm struggling with the other cases. Even the half_pixel linear case is giving me pain. As far as I can tell the guys from MS ONNX runtime are struggling with it too. So I may leave it at that and wait a bit for the dust to settle.
That said it may feel different for somebody with an image background, and now the tract integration aspect is more or less done. So if you feel like giving a shot at "filling in the blanks" in the current code to cover more cases, just tell me.
Thank you for the MVP. I have tried the following code:
use tract_onnx::prelude::*;
fn main() -> TractResult<()> {
let model = tract_onnx::onnx()
.model_for_path("model.onnx")?
.with_input_fact(0, InferenceFact::dt_shape(f32::datum_type(), tvec!(1, 3, 900, 1211)))?
.into_optimized()?
.into_runnable()?;
let img = image::open("input.png").unwrap().to_rgb();
let image: Tensor =
tract_ndarray::Array4::from_shape_fn((1, 3, 900, 1211), |(_, c, y, x)| {
img[(x as _, y as _)][c] as f32 / 255.0
}).into();
model.run(tvec!(image))?;
Ok(())
}
Unfortunately, it returns this error:
cargo run
Compiling rust v0.1.0 (/home/sonovice/projects/tract-testing)
Finished dev [unoptimized + debuginfo] target(s) in 3.61s
Running `target/debug/rust`
Error: TractError(Msg("Only Slice-1 and Slice-10 are supported"), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })
Seems, as if I have missed something. The code for this including the model (minimal example, 7 MB) can be found here: https://github.com/sonovice/tract-testing
Pushed a fix for the slice, mistake was on my side. Checked your network, but there is another error validity issue. You're supposed to provide either scales or sizes, not both (using "" as an input to skip one if needed), but your network provides both. (from the operator doc: Only one of 'scales' and 'sizes' can be specified. If 'size' is needed, the user can use an empty string as the name of 'scales' in this operator's input list.)
Please have a look if you can fix that on your side. If you can't, I'll be more lax on tract side by giving priority to sizes (as it comes in second).
Super fast response, thanks! This is the line from my pytorch model:
nn.Upsample(scale_factor=2, mode="bilinear", align_corners=True)
If I add size="", the code crashes: ValueError: only one of size or scale_factor should be defined
According to the source code, size should be None. So pytorch does some stuff to the resizing/upsampling layer internally that I have no control over... 😞
All right, let me see if I can do something on my side.
Interestingly, pytorch computes sizes instead, and pass something nonsensical as scale, instead of skipping the input. This is a bug in pytorch, but I have made a workaround.
[onnx/src/ops/resize.rs:101] input_scale = Some(
0xF32 ,
)
[onnx/src/ops/resize.rs:102] input_sizes = Some(
4xI64 1, 256, 604, 450,
)
Now we have another problem with a probable bug in tract's Pad operator. I'll have a look tomorrow.
So Pad was just not paranoid enough about its input, but I found and fix a couple issues in Slice ( #338 ). So now your model runs with this branch :)
(i mean the onnx-resize branch, i did rebase it)
I have fixed the swapped image dimensions in my example, now the code executes, thank you! To actually check wether the output is correct, I would need to write images from the Tensor output. In Python that piece of code would look like this:
import torchvision.transforms as transforms
names = ['background', 'upper', 'lower', 'barlines']
classes = model(image)[0] # Get first entry from batch
for i, name in enumerate(names):
output = 1.0 * (classes == i) # Get a float matrix with all pixels belonging to class 'i' as ones
output_image = transforms.ToPILImage()(output) # Convert matrix to PIL Image
output_image.save(f'{name}.png') # Saves binary PNG files
I am unsure how this would translate into your tract Tensors. How do I transform a Tensor into an ImageBuffer?
EDIT: Sorry if this is going into some offtopic space... I would like to verify the output of tract.
so you can tensor.to_array_view::<f32>()? to get a ndarray view on the tensor, and then put the data back into an image::RgbImage, i guess. Never done it, so I can't show you a code sample, only worked with categorized so far :)
Thanks, the hint to tensor.to_array_view did the trick. For completeness sake, here's the code:
let mut outputs = model.run(tvec!(input_tensor)).unwrap();
let output = outputs.pop().unwrap(); // Get first and only output
let output_tensor = output
.to_array_view::<i64>()
.unwrap()
.into_shape((1, 1211, 900)) // Fix tensor shape
.unwrap()
.permuted_axes([2, 1, 0]); // CHW -> WHC
let classes = ["background", "upper_stafflines", "lower_stafflines", "barlines"];
for (i, name) in classes.iter().enumerate() {
let class_tensor = output_tensor
.mapv(|a| (a == i as i64) as u8 * 255)
.into_raw_vec();
let output_image = GrayImage::from_raw(900, 1211, class_tensor).unwrap();
output_image.save(format!("{}.png", name)).unwrap();
}
The output looks exactly as expected. So for my special use case this would be all I need. Thank you very, very much for your effort, @kali!
(I'm not sure whether you want to leave this issue open for future resize updates so I won't close it.)
Ha, great news :) No, I'm going to merge and close, even if the Resize support is really alpha level right now (and very under optimized).
Can you authorize me to use the model (and maybe the bits of code too) ? They triggered enough tricky issues that I would like to put them under CI. This is only possible if you agree to do it, and if the model is trained with only data under permissive licences.
Can you authorize me to use the model (and maybe the bits of code too) ? They triggered enough tricky issues that I would like to put them under CI. This is only possible if you agree to do it, and if the model is trained with only data under permissive licences.
That's the least I can do. I've updated my repo. Feel free to use it for your purposes.
EDIT: Beware, execution is very slow (> 1min), even on my somewhat powerful CPU.
Yeah, it's slow, and not only because of Resize. You're in a model design space that is not my main focus: real time embedded voice application leads to relatively small models (not necessary in number of nodes, but they're mono dimensional, and use comparatively small weight tensors).
It shows: the audit code is actually broken by your network as a few convolution are in excess of 2 billions ops (2GFlop) are I used a 32bit counter, so it wraps... If you try tract model.onnx -O dump --profile --cost you can see this "negative costs" but also that the convolutions are actually more expensive than the naive Resize.
But there is room for improvement. For instance, the matrix multiplication (used by the convolution) code is inspired from the work on the BLIS library, but I have not implemented (yet) optimisations that target specifically middle to big products...
So it will get better.
My warning was not ment to be criticism at all, just a warning if you plan on executing it on a (usually) rather low spec CI VM. Anyways, I'm happy that my model can at least serve as part of a test suite after all the work you've put into it. 👍
No worries, did not mean to sound defensive on the slowness aspect :) just quickly writing down my findings... for you if you're curious or at least for my future self.
Hi there! Just checking back if you have copied all you need from my testing repo before I'll delete it?
Mmm... thanks for the heads up. No I don't think I have integrated yet to the test suite, and I definitely should. Let me get back to you on this in a few days, ok ?
Sure, no pressure. Just started to do my annual GitHub autumn cleaning ;)