ONE icon indicating copy to clipboard operation
ONE copied to clipboard

Different hdf5 structure between input data for nnpkg and for quantization

Open mhs4670go opened this issue 2 years ago • 7 comments

What

I realized there is hdf5 structure mismatch between input data for nnpkg and the one for quantization.

For example,

nnpkg

$ h5ls -r build/compiler/common-artifacts/Add_000.opt/metadata/tc/input.h5
/                        Group
/name                    Group
/value                   Group
/value/0                 Dataset {1, 4, 4, 3}
/value/1                 Dataset {1, 4, 4, 3}

And, the code level

$ cat tools/nnpackage_tool/gen_golden/gen_golden.py
..
with h5py.File(out_dir + "input.h5", 'w') as hf:
        name_grp = hf.create_group("name")
        val_grp = hf.create_group("value")
        for idx, t in enumerate(input_names):
            dtype = tf.compat.v1.as_dtype(input_dtypes[idx])
            if not dtype.name in supported_dtypes:
                print("ERR: Supported input types are {}".format(supported_dtypes))
                sys.exit(-1)
            val_grp.create_dataset(
                str(idx), data=input_values[idx], dtype=h5dtypes[dtype.name])
            name_grp.attrs[str(idx)] = input_names[idx]

quantizatoin

$ h5ls -r build/compiler/record-minmax-conversion-test/Conv2D_000.tflite.input.h5 
/                        Group
/value                   Group
/value/0                 Group
/value/0/0               Dataset {1, 3, 3, 2}
/value/1                 Group
/value/1/0               Dataset {1, 3, 3, 2}
/value/2                 Group
/value/2/0               Dataset {1, 3, 3, 2}

And the code level

$ cat compiler/rawdata2hdf5/rawdata2hdf5
# Input files
    num_converted = 0 
    for rawdata in datalist:
        with open(rawdata, 'rb') as f:
            sample = group.create_group(str(num_converted))
            num_converted += 1
            filename = os.path.basename(rawdata)
            sample.attrs['desc'] = filename
            raw_data = bytearray(f.read())
            # The target model is DNN with one input data
            sample.create_dataset('0', data=raw_data)


In short, nnpkg = /value/input_idx/Dataset quantization = /value/data_idx/input_idx/Dataset

When I create common-artifacts, I made testDataGenerator create hdf5 input data for nnpkg. So, it is hard to test quantization tools with common-artifacts.

I think two structures should be merged into one.

@seanshpark @chunseoklee @hseok-oh Any comment?

mhs4670go avatar Apr 08 '22 02:04 mhs4670go

Need comment from @jinevening

seanshpark avatar Apr 08 '22 02:04 seanshpark

@mhs4670go, @jinevening

About nnpkg, here is my comment in nnfw repository written 3 years ago. I tried to use same structure from nncc export action at that time.

Could you please take a look into this PR? It writes the output tensors in h5 format (same structure from nncc export action). See h5dump sample output for details.

glistening avatar Apr 08 '22 03:04 glistening

I think nnpkg and quantization have different formats because they have different purposes.

testDataGenerator create hdf5 input data for nnpkg. So, it is hard to test quantization tools with common-artifacts.

Isn't it possible to generate test data for quantization separately from nnpkg?

I think two structures should be merged into one.

I'm not sure we "should" merge them. But if we should, quantization format can express more, so nnpkg format may need to be changed. I don't know it is acceptable. I mean, is there a user of nnpkg (not us) ?

jinevening avatar Apr 11 '22 14:04 jinevening

+1 for I'm not sure we "should" merge them.

I don't have idea how current nnpkg tests are used in CI and each developer's environment. For example, as for me, I have to abandon my personal nnpkg tcs and create them again if the format is changed. It may be okay, but I am not sure for other's situation.

In addition, to change hdf5 structure for nnpkg test, there are several tools to be updated. ( nnpkg_run, gen_golden, sth2nnpkgtcs, run_nnpkgtcs ( ? - I don't remember the exact name ), there may be some resources in use by CI ).

glistening avatar Apr 11 '22 17:04 glistening

@jinevening

I think nnpkg and quantization have different formats because they have different purposes.

I agree with you. I just thought that it would convenient if we have just one format to maintain. But, on second thought with your opinion, it seems natural to have two format for different purposes.

I'm gonna generate inputs for quantization purpose in common-artifacts.

mhs4670go avatar Apr 12 '22 08:04 mhs4670go

I'm gonna generate inputs for quantization purpose in common-artifacts.

Q) Are you going to remove current generated inputs? If so, please check with internal setup CI as this maybe using them.

seanshpark avatar Apr 12 '22 09:04 seanshpark

I'm gonna generate inputs for quantization purpose in common-artifacts.

Q) Are you going to remove current generated inputs?

If so, please check with internal setup CI as this maybe using them.

No, I'm gonna make another input for quantization, which means existing inputs won't disappear.

mhs4670go avatar Apr 12 '22 15:04 mhs4670go