EdgeML icon indicating copy to clipboard operation
EdgeML copied to clipboard

SRNN Datapreprocessing script

Open pushkalkatara opened this issue 4 years ago • 9 comments

Hi @harsha-simhadri , this is a quick implementation of the script process_google.py. Also, I have checked SRNN and the accuracy is in the .ipynb of the PR. Solves issue #122

pushkalkatara avatar Aug 21 '19 21:08 pushkalkatara

@pushkalkatara is there any reason you preferred h5py over numpy.memmap?

metastableB avatar Aug 21 '19 21:08 metastableB

@metastableB numpy.memmap does not store the dims, dtypes, thus we would have to mention the test, train, val dims and dtypes in SRNN_example.py. Also, I have seen generally h5py or pandas being used for the purpose. We can shift to numpy.memmap if extra dependency is an issue.

pushkalkatara avatar Aug 21 '19 21:08 pushkalkatara

@pushkalkatara Yes I am apprehensive about adding an extra dependency just for one script though I must admit I don't have an idea of how complex the code will become if we do plain numpy. Lets use pandas instead? Its already part of the requirements here.

metastableB avatar Aug 22 '19 01:08 metastableB

@metastableB are you able to fix this using pandas?

harsha-simhadri avatar Aug 22 '19 05:08 harsha-simhadri

@pushkalkatara do you want me to take over or are you working on this?

metastableB avatar Aug 22 '19 17:08 metastableB

I can work on it. We would require to save the pandas data-frame in a format csv or pickel or h5. which one should i use?

pushkalkatara avatar Aug 22 '19 18:08 pushkalkatara

Thanks!

Ah, I did not think this through. CSV will causes file sizes to bloat. It seems pickel is the best route as numpy.load(here) also supports loading from pickled files.

We might have to change the scripts to reflect the new files names.

metastableB avatar Aug 23 '19 19:08 metastableB

@pushkalkatara Any updates?

metastableB avatar Aug 27 '19 16:08 metastableB

@metastableB Yes, I'll make the changes today.

pushkalkatara avatar Aug 28 '19 07:08 pushkalkatara