EdgeML SRNN Datapreprocessing script

SRNN Datapreprocessing script

Open pushkalkatara opened this issue 4 years ago • 9 comments

Hi @harsha-simhadri , this is a quick implementation of the script process_google.py. Also, I have checked SRNN and the accuracy is in the .ipynb of the PR. Solves issue #122

Aug 21 '19 21:08 pushkalkatara

@pushkalkatara is there any reason you preferred h5py over numpy.memmap?

Aug 21 '19 21:08 metastableB

@metastableB numpy.memmap does not store the dims, dtypes, thus we would have to mention the test, train, val dims and dtypes in SRNN_example.py. Also, I have seen generally h5py or pandas being used for the purpose. We can shift to numpy.memmap if extra dependency is an issue.

Aug 21 '19 21:08 pushkalkatara

@pushkalkatara Yes I am apprehensive about adding an extra dependency just for one script though I must admit I don't have an idea of how complex the code will become if we do plain numpy. Lets use pandas instead? Its already part of the requirements here.

Aug 22 '19 01:08 metastableB

@metastableB are you able to fix this using pandas?

Aug 22 '19 05:08 harsha-simhadri

@pushkalkatara do you want me to take over or are you working on this?

Aug 22 '19 17:08 metastableB

I can work on it. We would require to save the pandas data-frame in a format csv or pickel or h5. which one should i use?

Aug 22 '19 18:08 pushkalkatara

Thanks!

Ah, I did not think this through. CSV will causes file sizes to bloat. It seems pickel is the best route as numpy.load(here) also supports loading from pickled files.

We might have to change the scripts to reflect the new files names.

Aug 23 '19 19:08 metastableB

@pushkalkatara Any updates?

Aug 27 '19 16:08 metastableB

@metastableB Yes, I'll make the changes today.

Aug 28 '19 07:08 pushkalkatara

EdgeML EdgeML copied to clipboard

SRNN Datapreprocessing script

EdgeML
EdgeML copied to clipboard