training_extensions icon indicating copy to clipboard operation
training_extensions copied to clipboard

Feature Request: Support custom and non-square input sizes

Open j99ca opened this issue 1 year ago • 3 comments

According to the docs, the input sizes supported by OTX is a set list of square input sizes. With most convolutional model architectures, it should be possible to use non-square input sizes while maintaining the use of pre-trained weights, through a global pooling layer at the head of the model. This is possible with some classification models in TensorHub, and it would be a great feature for OTX classification and would accelerate our adoption of this library at the edge. I have use cases for very tall images from certain sensors where resizing them to any of the set list of square sizes skews the aspect ratio and can destroy the features needed for classification.

j99ca avatar Jun 04 '24 14:06 j99ca

@eunwoosh Let's consider non-square input size.

goodsong81 avatar Jun 10 '24 01:06 goodsong81

@goodsong81 @eunwoosh do you folks have a timeline for custom inputs (with or without non-square inputs) in this library? I am trying to schedule some integration into OTX 2.x and the lack of this feature is blocking.

Keep up the good work!

j99ca avatar Jun 14 '24 14:06 j99ca

@goodsong81 @eunwoosh do you folks have a timeline for custom inputs (with or without non-square inputs) in this library? I am trying to schedule some integration into OTX 2.x and the lack of this feature is blocking.

Keep up the good work!

Not yet confirmed but I suppose it will be enabled in the next quarter (Q3) of this year.

goodsong81 avatar Jun 17 '24 00:06 goodsong81

@goodsong81 I see that this PR got merged: https://github.com/openvinotoolkit/training_extensions/pull/3759

Could that input_size parameter be used instead of fixed values in the model scripts? E.g. in MobileNetV3Base:

class MobileNetV3Base(ModelInterface):
    """Base model of MobileNetV3."""

    def __init__(
        self,
        num_classes: int = 1000,
        width_mult: float = 1.0,
        in_channels: int = 3,
        input_size: tuple[int, int] = (224, 224),
        dropout_cls: nn.Module | None = None,
        pooling_type: str = "avg",
        feature_dim: int = 1280,
        instance_norm_first: bool = False,
        self_challenging_cfg: bool = False,
        **kwargs,
    ):

as well as associated export code? E.g. MobileNetV3ForMulticlassCls

    @property
    def _exporter(self) -> OTXModelExporter:
        """Creates OTXModelExporter object that can export the model."""
        return OTXNativeModelExporter(
            task_level_export_parameters=self._export_parameters,
            input_size=(1, 3, 224, 224),
            mean=(123.675, 116.28, 103.53),
            std=(58.395, 57.12, 57.375),
            resize_mode="standard",
            pad_value=0,
            swap_rgb=False,
            via_onnx=False,
            onnx_export_configuration=None,
            output_names=["logits", "feature_vector", "saliency_map"] if self.explain_mode else None,
        )

j99ca avatar Jul 30 '24 14:07 j99ca

Hi @j99ca , #3759 is preparation step for configurable input size. That PR just enables transforms in recipe to use $(input_size). I'm now implementing configurable input size using #3759. Currently, there is no input size configuration interface which updates both model and dataset, so if you want to do that, it's needed to change model class code which includes init argument or exporter part as you said.

eunwoosh avatar Jul 31 '24 01:07 eunwoosh

#3788 is merged. OTX supports non-square input size now.

eunwoosh avatar Aug 14 '24 07:08 eunwoosh