ingredient-phrase-tagger icon indicating copy to clipboard operation
ingredient-phrase-tagger copied to clipboard

Updates for speed and python 3 compatibility

Open dexteradeus opened this issue 8 years ago • 2 comments

As I started to use this system, I started making changes which I think could be useful to others.

  1. Updates to some scripts to improve python 2/3 compatibility
  2. Fixed a formatting bug in in the training file output to support running crf_learn with multiple threads
  3. Refactored crf file generation to support multithreading
  4. Updated roundtrip.sh to support providing counts as command line options and to use all system cores when generating data files as well as running crf_learn

On my system with 8 cores, I noticed a 7.5x reduction in processing time to run roundup.sh with the provided dataset.

dexteradeus avatar Jul 24 '16 05:07 dexteradeus

I can confirm this works when set up correctly. On macs the code to get a processor count will fail (line 4 of roundtrip.sh), but it is easy to hardcode a number.

walkerdb avatar Mar 10 '17 04:03 walkerdb

there is an error on rountrip.sh line 42 input_file instead of iput_file

also it seems not to generate test data def _generate_data_worker is never called. tested on ubuntu 16.04

maugch avatar Jul 22 '17 14:07 maugch