lttoolbox icon indicating copy to clipboard operation
lttoolbox copied to clipboard

make compiled files mmap-compatible

Open mr-martian opened this issue 2 years ago • 4 comments

This PR adds a new binary format for transducers which is compatible with memory mapping and adds to lt-proc the ability to load it via mmap.

It also makes the Python bindings link to the .so rather than recompiling the repo.

TODO before merging:

  • [ ] finish https://github.com/apertium/apertium/pull/130
  • [x] finish https://github.com/apertium/apertium-lex-tools/pull/79
  • [ ] make the appropriate changes to apertium-recursive
  • [x] test https://github.com/apertium/apertium-separable/pull/41
  • [ ] make sure apertium-anaphora and lexd are still ok
  • [ ] drop old transducer execution code in favor of updated versions
    • [ ] trans_exe.h/cc superseded by transducer_exe.h/cc
    • [ ] node.h/cc and match_node.h/cc replaced by flat arrays
    • [ ] delete match_state.h/cc and rename match_state2.h/cc
    • [ ] match_exe.h/cc functionality is now part of transducer_exe.h/cc
  • [ ] drop serialiser.h and deserialiser.h and related functions
    • only used by apertium-tagger and will be contained in a single file in apertium going forward
  • [ ] drop compression.h/cc write functions and mark read functions as deprecated
  • [ ] move pattern_list.h/cc to apertium
  • [x] lt-proc -e nno-nob.automorf.bin is currently segfaulting

mr-martian avatar Jul 26 '21 17:07 mr-martian

For C++14-and-a-bit support, string_view needs to be included akin to https://github.com/apertium/apertium/blob/master/configure.ac#L81 + https://github.com/apertium/apertium/blob/master/apertium/string_view.h

TinoDidriksen avatar Jul 27 '21 11:07 TinoDidriksen

While we're in the business of speeding things up and breaking internal backwards compatibility, it would probably be a good idea to switch the datatype of Transducer from map<int, multimap<int, pair<int, double>>> to vector<multimap<int, pair<int, double>>>. Only code that wrote out the full type signature rather than using auto would have to change, since states are always added sequentially from 0 anyway.

mr-martian avatar Jul 13 '22 22:07 mr-martian

...states are always added sequentially from 0 anyway.

And they are never removed out-of-order, leaving holes?

TinoDidriksen avatar Jul 13 '22 23:07 TinoDidriksen

...states are always added sequentially from 0 anyway.

And they are never removed out-of-order, leaving holes?

There isn't any mechanism for removing states. Any operation that decreases the number of states is actually creating a copy and then swapping.

mr-martian avatar Jul 13 '22 23:07 mr-martian