ONE [compiler/one-cmds] Introduce one-infer

What

Let's introduce one-infer to onecc as a inferring wrapper binary.

Why

ONE-vscode is planed to support End-to-end value validation test. one-infer will execute the backend models' inference tool as one-profile does.

There was a previous work at #7299

[x] Create inference wrapper binary (https://github.com/Samsung/ONE/pull/7419)
- [x] Add one-inference man page
- [x] Add test cases
- [x] Add debian package
[ ] Introduce tflite-infer, onnx-infer for each runtime

Apr 19 '22 08:04 yunjayh

[WIP] Let's define data spec for each data format (e.g. `npy` and `h5`).

Let's assume there is a model.tflite, which takes two input tensors (their shapes are (1,5,5,3) and (2,2) for each), and the 1000 sample images which is used for input. After data-preprocessing stage, user got 1000 pairs of input data.

For npy

$ tree ./input_data
input_data
├── input_data.0.0.npy
├── input_data.0.1.npy
├── input_data.1.0.npy
├── input_data.1.1.npy
├── ...
├── input_data.999.0.npy
└── input_data.999.1.npy

$ tflite-infer model.tflite --input-spec npy --load-input-from input_data ...
running....

For h5

$ some-cmd-show-h5-hierarchy input_data.h5
# GROUP "/"
# ㄴGROUP "input"
#   ㄴGROUP "0"
#     ㄴDATASET "0"
#       ㄴDATA ... [shape (1,5,5,3)]
#     ㄴDATASET "1"
#       ㄴDATA ... [shape (2,2)]
#   ㄴGROUP "1"
#     ㄴDATASET "0"
#       ㄴDATA ... [shape (1,5,5,3)]
#     ㄴDATASET "1"
#       ㄴDATA ... [shape (2,2)]
#   ...
#   ㄴGROUP "999"
#     ㄴDATASET "0"
#       ㄴDATA ... [shape (1,5,5,3)]
#     ㄴDATASET "1"
#       ㄴDATA ... [shape (2,2)]

$ tflite-infer model.tflite --input-spec h5 --load-input-from input_data ...
running....

May 17 '22 06:05 yunjayh

[WIP] How about extracting data generator?

To remaining unified data spec for each inference driver, it'd be better to introduce data generator which takes the input tensors' shape and data type, such as float32 and int8.

$ data-generator -h
usage: data-generator [-d INPUT_DETAIL] [-t INPUT_TYPE] [-o OUTPUT]
data generating tool for specific shape and type

-d INPUT_DETAIL, --input-detail INPUT_DETAIL
               input tensor details with json format
               e.g. {0(input_index): {'shape': [1, 5, 5, 3], 'dtype': 'float32'}, 1: {...}}
-t, --input-type {npy, h5}
               file format to save
-o OUTPUT, --output OUTPUT
               output file name to save

Each some-backend-infer can use data-generator for unified data spec.

May 17 '22 07:05 yunjayh

Define options of `tflite-infer`

# This is from my draft
$ jay@YUNJAY tflite-infer -h
usage: infer-tflite [OPTIONS]

    Command line tool for inferring tflite model.

    Input data for given model can be randomly generated, or given by options.
    Both input and output data will be saved as given file name. 
    If IN/OUT_FILE_AS is empty, MODEL_NAME.output.0.npy will be created for output data.

optional arguments:
  -h, --help            show this help message and exit
  -l LOADABLE, --loadable LOADABLE
                        tflite model path to infer
  --input-spec INPUT_SPEC
                        option for input tensor data (generate or import)
  --dump-input-buffer-as-npy-with-prefix DUMP_INPUT_BUFFER_AS_NPY_WITH_PREFIX
                        dump input buffer as file name format of {FILE_NAME}.{INPUT_IDX}.npy
  --dump-input-buffer-as-h5-with-prefix DUMP_INPUT_BUFFER_AS_H5_WITH_PREFIX
                        dump input buffer as file name format of {FILE_NAME}.h5
  --dump-output-as-npy DUMP_OUTPUT_AS_NPY
                        dump output as file name format of {FILE_NAME}.output.{OUTPUT_IDX}.npy
  --dump-output-as-h5 DUMP_OUTPUT_AS_H5
                        dump output as file name format of {FILE_NAME}.output.h5

$ jay@YUNJAY tflite-infer -l my_model.tflite --input-spec npy:npy_file_name --dump-input-buffer-as-h5-prefix my_h5_input_name --dump-output-as-h5 my_h5_output_name
... running

$ jay@YUNJAY tree .
.
├── my_model.tflite
├── npy_file_name
│   ├── npy_file_name.0.npy
│   ├── npy_file_name.1.npy
│   └── npy_file_name.2.npy
├── my_h5_input_name.h5 (new)
└── my_h5_output_name.h5 (new)

The option names are a little bit long, but it seems reasonable for understanding the users.

May 20 '22 08:05 yunjayh

usage: one-infer-tflite [OPTIONS]

one- is unintended or is it renamed?

--dump-input-buffer-as-npy-with-prefix

As you wrote, this is quite long... Anyway, Q) as there is with-prefix, is there without-prefix ?

For me, I would write --dump-input-buffer-as-npy-with-prefix to just --dump-input-npy and --dump-output-as-npy as --dump-output-npy. There is help to show descriptions for abstract options. But just my opinion so you can as you like.

├── npy_file_name │ ├── npy_file_name.0.npy

Q) why is there npy_file_name folder and files inside it?

--input-spec INPUT_SPEC option for input tensor data (generate or import) --input-spec npy:npy_file_name

example shows there is npy + : + npy_file_name but the help doesn't show this format.

May 20 '22 09:05 seanshpark

one- is unintended or is it renamed?

mistake! I fixed it.

As you wrote, this is quite long... Anyway, Q) as there is with-prefix, is there without-prefix ?

For me, I would write --dump-input-buffer-as-npy-with-prefix to just --dump-input-npy and --dump-output-as-npy as --dump-output-npy. There is help to show descriptions for abstract options. But just my opinion so you can as you like.

I followed existing option name from in-house driver, since some of options that I named didn't look cool. --dump-output-npy fine to me. :)

Q) why is there npy_file_name folder and files inside it?

I made a room for multiple input data case. For example, tflite-infer [SOME OPTIONS] --run 10 command runs tflite model for 10 times with different input data. Then, npy_file_name/npy_file_name.0.{tensor_idx}.npy will be the input for first run, and respectively the others.

I found a thing to change, from your comment, the npy file name should be changed to npy_file_name.0.0.npy for my plan.

example shows there is npy + : + npy_file_name but the help doesn't show this format.

Right. I'll try to add it.

May 20 '22 09:05 yunjayh

I made a room for multiple input data case...

I mean, is there any problems if we just put the files in current folder?

May 20 '22 09:05 seanshpark

Personally I prefer clean directory, and I thought this solution since the related npy data are located in same sub-directory. If there is any problem or software-good-approach, would you please give me any advice?

May 23 '22 00:05 yunjayh

If there is any problem or software-good-approach, would you please give me any advice?

I'm not aware of such case.

Personally I prefer clean directory, and I thought this solution since the related npy data are located in same sub-directory.

OK... I'll understand this as better organizing files.

May 23 '22 01:05 seanshpark

example shows there is npy + : + npy_file_name but the help doesn't show this format.

There are a few candidates, but I'm not sure which way is better for user as reading help message. (personally I prefer the first one.)

Given by usage

In this case, detailed data spec will be described in man page.

$ bash ./build/compiler/one-cmds/tflite-infer -h
usage: 
    tflite-infer [-l/--loadable] [--input-spec=<any|positive|non-zero|npy:{filename}|h5:{filename}>]
    [--dump-input-npy] [--dump-input-h5] [--dump-output-npy] [--dump-output-h5]

    Command line tool for inferring tflite model.
...
  --input-spec INPUT_SPEC
                        option for input tensor data (generate or import)
...

tips in optional parameter description shortly

$ bash ./build/compiler/one-cmds/tflite-infer -h
usage: 
    tflite-infer[OPTIONS]
...
  --input-spec INPUT_SPEC
                        option for input tensor data (generate or import)
                            [any | non-zero | positive] generates random input data with given condition
                            [npy:<file_name> | h5:<file_name>] can be used as input data
...

or tips in optional parameter description as detailed as possible

$ bash ./build/compiler/one-cmds/tflite-infer -h
usage: 
    tflite-infer[OPTIONS]
...
  --input-spec INPUT_SPEC
                        option for input tensor data (generate or import)
                        [Example]
                            To run model.tflite with some random input data with condition,
                            $ tflite-infer -l model.tflite --input-spec {any | positive | non-zero} [OPTIONS]
                            
                            To run model.tflite with .npy files, (which is saved as {filename}/{filename}.{data_num}.{tensor_idx}.npy)
                            $ tflite-infer -l model.tflite --input-spec <npy:{filename}>
                            
                            To run model.tflite with .h5 file, (while {filename}.h5 file's structure is /input/{data_num}/{tensor_idx})
                            $ tflite-infer -l model.tflite --input-spec <h5:{filename}>

...

May 23 '22 10:05 yunjayh

I'd like to also discuss output format of backends other than tflite or onnx.

For example, assume that there is a backend company FOO... :-)

Assume that FOO has their proprietary output format *.foo. They might also use *.h5 format but the internal structure could be different from https://github.com/Samsung/ONE/issues/8970#issuecomment-1128479359.

Can FOO-infer produce output in *.foo or FOO-company-specific *.h5 format? Or do they have to convert their format into h5 in https://github.com/Samsung/ONE/issues/8970#issuecomment-1128479359 ?

Related to this issue, last time during offline meeting, there was discussion about other tool such as one-compare that might get two output files as param and show the the comparison result.

Considering the above assumptions, the following combination could be possible:

case	output format of `Foo-infer` could be...	Should FOO need to provide `Foo-compare`?
case 1	`*.foo`	yes because `one-compare` cannot parse `*.foo`
case 2	FOO-specific `*.h5` format	yes because `one-compare` cannot parse FOO-specific `*.h5`
case 3	format defined in https://github.com/Samsung/ONE/issues/8970#issuecomment-1128479359	maybe no

I wonder which one would be allowed.

IMHO, we need to allow case 1, 2 and 3. I'd like to hear other's opinion.

Jun 07 '22 04:06 hyunsik-yoon

Discussed with @seanshpark and @yunjayh today.

Current approach would be

Consider case 3 (not case 1 or 2)

Jun 07 '22 05:06 hyunsik-yoon

ONE ONE copied to clipboard

[compiler/one-cmds] Introduce one-infer

What

Why

[WIP] Let's define data spec for each data format (e.g. npy and h5).

[WIP] How about extracting data generator?

Define options of tflite-infer

ONE
ONE copied to clipboard

[WIP] Let's define data spec for each data format (e.g. `npy` and `h5`).

Define options of `tflite-infer`