grobid icon indicating copy to clipboard operation
grobid copied to clipboard

wapiti command line 2x faster than the grobid integration

Open jribault opened this issue 4 years ago • 10 comments

Hi,

I tried the wapiti command line (after installing and compiling it on my linux machine) and found it is 2 time faster (27s instead of 50s per loop) than the one used by grobid directly. Did I miss something ?

btw, I really enjoy grobid, thanks you very much !

jribault avatar Aug 08 '19 06:08 jribault

Dear @jribault could you please provide more details? I assume you used the same data? Which platform? Could you provide the file you used for training? Feel free to share also the log file if you still have it for both cases.

Thank you

lfoppiano avatar Aug 09 '19 07:08 lfoppiano

Same data, on linux, with grobid 0.5.4. I uncommented the println that display the corresponding wapiti command line, for exemple I got with only 5 files in corpus :


    epsilon: 1.0E-5
    window: 20
    nb max iterations: 150
    nb threads: 16

Training with equivalent command line: wapiti train --nthread 16 --algo rprop+ -e 0.000010 -w 20 -i 150 -p /home/osboxes/grobid-0.5.4-testPatentR4R/grobid-patent/../grobid-patent/resources/dataset/patent/crfpp-templates/astro.template /home/osboxes/grobid-0.5.4-testPatentR4R/grobid-home/tmp/patent6631759912743468091.train /home/osboxes/grobid-0.5.4-testPatentR4R/grobid-patent/../grobid-home/models/patent/model.wapiti.new

  • Load patterns
  • Load training data
  • Initialize the model
  • Summary nb train: 5 nb labels: 11 nb blocks: 95260 nb features: 1047970
  • Train the model with rprop+ [ 1] obj=49576.48 act=0 err=99.94%/100.00% time=0.49s/0.49s [ 2] obj=49576.48 act=232140 err= 0.23%/100.00% time=0.59s/1.08s [ 3] obj=14223.32 act=1396 err=99.75%/100.00% time=0.49s/1.57s

Then ../../wapiti-1.5.0/wapiti train --nthread 16 --algo rprop+ -e 0.000010 -w 20 -i 150 -p /home/osboxes/grobid-0.5.4-testPatentR4R/grobid-patent/../grobid-patent/resources/dataset/patent/crfpp-templates/astro.template /home/osboxes/grobid-0.5.4-testPatentR4R/grobid-home/tmp/patent6631759912743468091.train /home/osboxes/grobid-0.5.4-testPatentR4R/grobid-patent/../grobid-home/models/patent/model.wapiti.new

  • Load patterns
  • Load training data
  • Initialize the model
  • Summary nb train: 5 nb labels: 11 nb blocks: 95260 nb features: 1047970
  • Train the model with rprop+ [ 1] obj=49576.48 act=0 err=99.94%/100.00% time=0.26s/0.26s [ 2] obj=49576.48 act=232140 err= 0.23%/100.00% time=0.22s/0.48s [ 3] obj=14223.32 act=1396 err=99.75%/100.00% time=0.22s/0.69s

I don't think the wapiti command line produce a logfile ? Here is the grobid logfile :

09 Aug 2019 14:19.52 [WARN ] GrobidHomeFinder - No Grobid property was provided. Attempting to find Grobid home in the current directory... 09 Aug 2019 14:19.52 [WARN ] GrobidHomeFinder - *************************************************************** 09 Aug 2019 14:19.52 [WARN ] GrobidHomeFinder - *** USING GROBID HOME: /home/osboxes/grobid-0.5.4-testPatentR4R/grobid-patent/../grobid-home 09 Aug 2019 14:19.52 [WARN ] GrobidHomeFinder - *************************************************************** 09 Aug 2019 14:19.52 [DEBUG] GrobidProperties - synchronized getNewInstance 09 Aug 2019 14:19.52 [DEBUG] GrobidProperties - loading grobid.properties 09 Aug 2019 14:19.52 [WARN ] GrobidHomeFinder - Grobid property file location was not explicitly set via 'org.grobid.property' system variable, defaulting to: /home/osboxes/grobid-0.5.4-testPatentR4R/grobid-patent/../grobid-home/config/grobid.properties 09 Aug 2019 14:19.52 [DEBUG] GrobidProperties - Checking Properties 09 Aug 2019 14:19.52 [DEBUG] GrobidProperties - loading pdf2xml path 09 Aug 2019 14:19.52 [DEBUG] GrobidProperties - pdf2xml home directory set to /home/osboxes/grobid-0.5.4-testPatentR4R/grobid-home/pdf2xml/lin-64 09 Aug 2019 14:19.52 [INFO ] LibraryLoader - Loading external native sequence labelling library 09 Aug 2019 14:19.52 [DEBUG] LibraryLoader - /home/osboxes/grobid-0.5.4-testPatentR4R/grobid-home/lib/lin-64 09 Aug 2019 14:19.52 [INFO ] LibraryLoader - Loading Wapiti native library... 09 Aug 2019 14:19.52 [INFO ] LibraryLoader - Native library for sequence labelling loaded 09 Aug 2019 14:19.52 [DEBUG] Lexicon - Get new instance of Lexicon 09 Aug 2019 14:19.52 [INFO ] Lexicon - Initiating dictionary 09 Aug 2019 14:19.52 [INFO ] Lexicon - End of Initialization of dictionary 09 Aug 2019 14:19.52 [INFO ] Lexicon - Initiating names 09 Aug 2019 14:19.52 [INFO ] Lexicon - End of initialization of names 09 Aug 2019 14:19.53 [INFO ] Lexicon - Initiating country codes 09 Aug 2019 14:19.53 [INFO ] Lexicon - End of initialization of country codes

jribault avatar Aug 09 '19 12:08 jribault

My guess is that my local wapiti installation, compiled by myself, is better optimized for my hardware.

Is there any way I build the .so from wapiti source to replace the one in grobid ?

jribault avatar Aug 09 '19 12:08 jribault

@jribault if you follow the instruction on the readme https://github.com/kermitt2/Wapiti/ and copy the library under grobid-home/lib/your_architecture

lfoppiano avatar Aug 09 '19 15:08 lfoppiano

It's interesting, thanks @jribault !

On my workstation, the one that I used for building the Wapiti embedded then natively in GROBID, I have a minimal difference, likely due the Java overhead, but nothing like 2 times difference. Here is a comparison with the header model (the largest one), with the Wapiti fork that we are using (https://github.com/kermitt2/Wapiti - but it should be the same with the Wapiti 1.5 distribution in term of performance, our fork is more robust).

GROBID's embedded Wapiti:

Training with equivalent command line: 
wapiti train --nthread 8 -e 0.000010 -w 20 -i 2000 -p /home/lopez/grobid/grobid-trainer/../grobid-home/../grobid-trainer/resources/dataset/header/crfpp-templates/header.template /home/lopez/grobid/grobid-home/tmp/header7689040464581701931.train /home/lopez/grobid/grobid-trainer/../grobid-home/models/header/model.wapiti.new
* Load patterns
* Load training data
   1000 sequences loaded
   2000 sequences loaded
* Initialize the model
* Summary
    nb train:    2537
    nb labels:   42
    nb blocks:   1927555
    nb features: 80959032
* Train the model with l-bfgs
  [   1] obj=1531979.05 act=4784104  err=50.94%/99.92% time=50.46s/50.46s
  [   2] obj=1124879.37 act=5090441  err=50.94%/99.92% time=32.95s/83.41s
  [   3] obj=1088305.43 act=4800777  err=50.94%/99.92% time=33.51s/116.91s
...

locally built Wapiti:

lopez@work:~/grobid$ ../Wapiti/build/wapiti_runner train --nthread 8 -e 0.000010 -w 20 -i 2000 -p /home/lopez/grobid/grobid-trainer/../grobid-home/../grobid-trainer/resources/dataset/header/crfpp-templates/header.template /home/lopez/grobid/grobid-home/tmp/header7689040464581701931.train /home/lopez/grobid/grobid-trainer/../grobid-home/models/header/model.wapiti.new
* Load patterns
* Load training data
   1000 sequences loaded
   2000 sequences loaded
* Initialize the model
* Summary
    nb train:    2537
    nb labels:   42
    nb blocks:   1927555
    nb features: 80959032
* Train the model with l-bfgs
  [   1] obj=1531979.05 act=4784102  err=50.94%/99.92% time=48.28s/48.28s
  [   2] obj=1124879.37 act=5090443  err=50.94%/99.92% time=29.02s/77.30s
  [   3] obj=1088305.43 act=4800777  err=50.94%/99.92% time=29.89s/107.19s
...

Note that we are using l-bfgs training algorithm, and not rprop+ (which should be faster but less accurate).

So indeed the difference might come from the locally-optimized build.

For replacing the .so in GROBID, it's very easy with our fork. Just compile as indicated, you will have both the .so and the executable (wapiti_runner train) under Wapiti/build/. Copy the .so in the corresponding GROBID home like this:

cp ~/Wapiti/build/libwapiti.so ~/grobid/grobid-home/lib/lin-64/libwapiti.so

You can remove the other .so under grobid-home/lib/lin-64/as the dynamic linking on your system will be fine, or replace them with the .so obtained by the script collect-dependencies.sh of the Wapiti fork.

kermitt2 avatar Aug 09 '19 15:08 kermitt2

Hi,

So I just compile your fork of wapiti and replace the .so but I still have the same running time: wapiti command line is 2x faster.

I try to directly run your fork using wapiti_runner and the original wapiti is also 2x faster !

So it seams there is something on your fork that makes wapiti slower. I can do some test or provides information :)

jribault avatar Aug 18 '19 08:08 jribault

For example :

../../Wapiti-master/build/wapiti_runner train --nthread 16 --algo l-bfgs -e 0.000010 -w 20 -i 2000 -p [...]

  • Load patterns
  • Load training data
  • Initialize the model
  • Summary nb train: 46 nb labels: 13 nb blocks: 334558 nb features: 4349410
  • Train the model with l-bfgs [ 1] obj=49210.60 act=950988 err= 1.80%/100.00% time=8.71s/8.71s [ 2] obj=26402.82 act=608858 err= 1.80%/100.00% time=4.97s/13.68s [ 3] obj=22590.62 act=412838 err= 1.80%/100.00% time=5.09s/18.77s [ 4] obj=20417.67 act=256103 err= 1.80%/100.00% time=5.14s/23.91s

../../wapiti-1.5.0/wapiti train --nthread 16 --algo l-bfgs -e 0.000010 -w 20 -i 2000 -p [...]

  • Load patterns
  • Load training data
  • Initialize the model
  • Summary nb train: 46 nb labels: 13 nb blocks: 334558 nb features: 4349410
  • Train the model with l-bfgs [ 1] obj=49210.60 act=950988 err= 1.80%/100.00% time=3.84s/3.84s [ 2] obj=26402.82 act=608858 err= 1.80%/100.00% time=2.13s/5.97s [ 3] obj=22590.62 act=412838 err= 1.80%/100.00% time=2.21s/8.18s [ 4] obj=20417.67 act=256103 err= 1.80%/100.00% time=2.20s/10.38s

jribault avatar Aug 18 '19 08:08 jribault

wapiti seems generally abandoned over at https://github.com/Jekub/Wapiti -- I'd imagine there are some faster alternatives around.

Downchuck avatar Feb 10 '20 08:02 Downchuck

Hi @Downchuck ! Yes it is abandoned I think, but we are using a fork that we are maintaining https://github.com/kermitt2/wapiti - the fork includes quite a few fixes (in particular portable models) and some additions like JNI interface and better error handling. So the original implementation/project is not something blocking, talhough the speed difference might come from the JNI compilation of this fork.

Wapiti is the fastest CRF implementation I know for training and it scales super well for prediction (I reach 100,000 tokens per second on one average 4-core machine with a large model). CRFSuite is also very comprehensive (it supports non discrete feature values), but lacks multi-threaded training, making it not really usable for us. So Wapiti is still the best choice today I think for CRF (we are also exploring DL alternatives since end of 2018).

kermitt2 avatar Feb 12 '20 01:02 kermitt2

@kermitt2 Hi, I recently migrated to an M1 MacBook and have found that I may need to compile libwapiti.dylib for arm64 (as it's not offered in the release). But I've run into an error with compiling

CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
  Compatibility with CMake < 2.8.12 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


CMake Error at /opt/homebrew/Cellar/cmake/3.23.2/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find JNI (missing: JAVA_INCLUDE_PATH JAVA_INCLUDE_PATH2
  JAVA_AWT_INCLUDE_PATH)
Call Stack (most recent call first):
  /opt/homebrew/Cellar/cmake/3.23.2/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /opt/homebrew/Cellar/cmake/3.23.2/share/cmake/Modules/FindJNI.cmake:367 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
  CMakeLists.txt:15 (find_package)

Wondering if you have any thoughts on this? I've added the env vars to the proper .rc files and it doesn't seem to change.

jacksongoode avatar Jun 06 '22 07:06 jacksongoode

Just bumping this once more - I'm still stumped.

jacksongoode avatar Nov 29 '22 08:11 jacksongoode

@lfoppiano Hi Luca! did you try to recompile wapiti for ARM architecture? I guess it's part of the support for M1 to rebuild Wapiti and avoid emulation, but it seems that Java standard path have changed on MacOS?

kermitt2 avatar Nov 29 '22 10:11 kermitt2

Late last night I was able to recompile by deleting the CMakeCache.txt within the build folder and trying again - just posting for others that might need the solution. It seems users on Apple ARM may need to recompile as the mac64 lib fails to register in Java.

jacksongoode avatar Nov 29 '22 16:11 jacksongoode

@jribault I've forgotten about the Wapiti performance issue I am sorry, but I looked again this morning following latest comments here and it's simply that our fork was built without the CMAKE_BUILD_TYPE="Release" config. So it was built in debug mode (missing -DNDEBUG) and without optimization flags (missing -O3). This was very obvious and I should have looked at this 3 years ago :(

I've updated the Wapiti fork and the Linux 64 JNI binaries in grobid, and there is now no difference in speed for training as compared to the original Wapiti binary (which is indeed now more than 2 times faster than before).

Before:

* Load patterns
* Load training data
* Initialize the model
* Summary
    nb train:    716
    nb labels:   40
    nb blocks:   1057715
    nb features: 42310160
* Train the model with l-bfgs
  [   1] obj=820496.36  act=2553891  err=50.54%/100.00% time=40.41s/40.41s
  [   2] obj=619222.62  act=2726615  err=50.54%/100.00% time=31.90s/72.31s
  [   3] obj=595121.80  act=2467295  err=50.54%/100.00% time=32.84s/105.15s
  [   4] obj=543026.13  act=2075768  err=46.57%/100.00% time=17.89s/123.05s
  [   5] obj=536035.18  act=2382857  err=50.54%/100.00% time=34.73s/157.78s
  [   6] obj=497845.11  act=2236987  err=46.38%/100.00% time=41.46s/199.24s
  [   7] obj=488193.81  act=2265320  err=50.53%/100.00% time=35.38s/234.63s
  [   8] obj=471675.35  act=2169798  err=48.37%/100.00% time=35.28s/269.91s
  [   9] obj=422947.93  act=2048831  err=42.42%/100.00% time=18.94s/288.84s
  [  10] obj=368704.47  act=1970766  err=38.23%/100.00% time=24.30s/313.15s
  [  11] obj=329680.45  act=1990813  err=37.26%/100.00% time=25.57s/338.72s
  [  12] obj=311675.59  act=1915082  err=36.12%/100.00% time=25.80s/364.52s
...

After Wapiti update:

* Load patterns
* Load training data
* Initialize the model
* Summary
    nb train:    716
    nb labels:   40
    nb blocks:   1057715
    nb features: 42310160
* Train the model with l-bfgs
  [   1] obj=820496.36  act=2553887  err=50.54%/100.00% time=11.49s/11.49s
  [   2] obj=619222.62  act=2726615  err=50.54%/100.00% time=7.04s/18.53s
  [   3] obj=595121.80  act=2467295  err=50.54%/100.00% time=7.27s/25.81s
  [   4] obj=543026.13  act=2075768  err=46.57%/100.00% time=7.51s/33.32s
  [   5] obj=536035.18  act=2382857  err=50.54%/100.00% time=7.70s/41.02s
  [   6] obj=497845.11  act=2236987  err=46.38%/100.00% time=12.85s/53.87s
  [   7] obj=488193.81  act=2265320  err=50.53%/100.00% time=7.86s/61.73s
  [   8] obj=471675.35  act=2169798  err=48.37%/100.00% time=7.91s/69.64s
  [   9] obj=422947.93  act=2048831  err=42.42%/100.00% time=8.23s/77.87s
  [  10] obj=368704.47  act=1970766  err=38.23%/100.00% time=8.74s/86.61s
  [  11] obj=329680.45  act=1990813  err=37.26%/100.00% time=8.66s/95.27s
  [  12] obj=311675.59  act=1915082  err=36.12%/100.00% time=9.51s/104.78s
...

kermitt2 avatar Nov 30 '22 12:11 kermitt2

It has some positive impact on the runtime too, although most of the computation take place outside Wapiti:

Before:

PDF processing 100% │████████████████│ 1943/1943 (0:08:03 / 0:00:00) 

After Wapiti update:

PDF processing 100% │████████████████│ 1943/1943 (0:06:50 / 0:00:00) 

so 1943 PDF processed in 410 seconds (including header consolidation with biblio-glutton) with 8 threads and outdated Intel Core i7-7700K CPU 4.20GHz, instead of 483 seconds.

kermitt2 avatar Nov 30 '22 13:11 kermitt2

@kermitt2 great job! I will build the lib for the ARM architecture

lfoppiano avatar Dec 01 '22 06:12 lfoppiano

@lfoppiano Will that be committed to Grobid as well?

jacksongoode avatar Dec 01 '22 06:12 jacksongoode

@jacksongoode yes. Updates will be in #937 I have an Apple M1, so I cannot avoid that 😄

lfoppiano avatar Dec 01 '22 06:12 lfoppiano