ingredient-phrase-tagger
ingredient-phrase-tagger copied to clipboard
Updates for speed and python 3 compatibility
As I started to use this system, I started making changes which I think could be useful to others.
- Updates to some scripts to improve python 2/3 compatibility
- Fixed a formatting bug in in the training file output to support running crf_learn with multiple threads
- Refactored crf file generation to support multithreading
- Updated roundtrip.sh to support providing counts as command line options and to use all system cores when generating data files as well as running crf_learn
On my system with 8 cores, I noticed a 7.5x reduction in processing time to run roundup.sh with the provided dataset.
I can confirm this works when set up correctly. On macs the code to get a processor count will fail (line 4 of roundtrip.sh), but it is easy to hardcode a number.
there is an error on rountrip.sh line 42 input_file instead of iput_file
also it seems not to generate test data def _generate_data_worker is never called. tested on ubuntu 16.04