lttoolbox
lttoolbox copied to clipboard
make compiled files mmap-compatible
This PR adds a new binary format for transducers which is compatible with memory mapping and adds to lt-proc
the ability to load it via mmap.
It also makes the Python bindings link to the .so
rather than recompiling the repo.
TODO before merging:
- [ ] finish https://github.com/apertium/apertium/pull/130
- [x] finish https://github.com/apertium/apertium-lex-tools/pull/79
- [ ] make the appropriate changes to apertium-recursive
- [x] test https://github.com/apertium/apertium-separable/pull/41
- [ ] make sure apertium-anaphora and lexd are still ok
- [ ] drop old transducer execution code in favor of updated versions
- [ ]
trans_exe.h/cc
superseded bytransducer_exe.h/cc
- [ ]
node.h/cc
andmatch_node.h/cc
replaced by flat arrays - [ ] delete
match_state.h/cc
and renamematch_state2.h/cc
- [ ]
match_exe.h/cc
functionality is now part oftransducer_exe.h/cc
- [ ]
- [ ] drop
serialiser.h
anddeserialiser.h
and related functions- only used by
apertium-tagger
and will be contained in a single file in apertium going forward
- only used by
- [ ] drop
compression.h/cc
write functions and mark read functions as deprecated - [ ] move
pattern_list.h/cc
to apertium - [x]
lt-proc -e nno-nob.automorf.bin
is currently segfaulting
For C++14-and-a-bit support, string_view
needs to be included akin to https://github.com/apertium/apertium/blob/master/configure.ac#L81 + https://github.com/apertium/apertium/blob/master/apertium/string_view.h
While we're in the business of speeding things up and breaking internal backwards compatibility, it would probably be a good idea to switch the datatype of Transducer
from map<int, multimap<int, pair<int, double>>>
to vector<multimap<int, pair<int, double>>>
. Only code that wrote out the full type signature rather than using auto
would have to change, since states are always added sequentially from 0
anyway.
...states are always added sequentially from
0
anyway.
And they are never removed out-of-order, leaving holes?
...states are always added sequentially from
0
anyway.And they are never removed out-of-order, leaving holes?
There isn't any mechanism for removing states. Any operation that decreases the number of states is actually creating a copy and then swapping.