EdgeML
EdgeML copied to clipboard
SRNN Datapreprocessing script
Hi @harsha-simhadri ,
this is a quick implementation of the script process_google.py
.
Also, I have checked SRNN and the accuracy is in the .ipynb of the PR.
Solves issue #122
@pushkalkatara is there any reason you preferred h5py
over numpy.memmap
?
@metastableB numpy.memmap
does not store the dims, dtypes, thus we would have to mention the test, train, val dims and dtypes in SRNN_example.py
. Also, I have seen generally h5py or pandas being used for the purpose. We can shift to numpy.memmap
if extra dependency is an issue.
@pushkalkatara Yes I am apprehensive about adding an extra dependency just for one script though I must admit I don't have an idea of how complex the code will become if we do plain numpy. Lets use pandas
instead? Its already part of the requirements here.
@metastableB are you able to fix this using pandas?
@pushkalkatara do you want me to take over or are you working on this?
I can work on it. We would require to save the pandas data-frame in a format csv
or pickel
or h5
. which one should i use?
Thanks!
Ah, I did not think this through. CSV will causes file sizes to bloat. It seems pickel
is the best route as numpy.load
(here) also supports loading from pickled files.
We might have to change the scripts to reflect the new files names.
@pushkalkatara Any updates?
@metastableB Yes, I'll make the changes today.