BlackLab
BlackLab copied to clipboard
Switch default XML parser to Saxon?
BlackLab uses the XML library VTD-XML by default for processing documents while indexing. This only supports XPath 1.0.
@eduarddrenth made it possible to use Saxon, a more feature-rich (supports XPath 3) and potentially faster alternative, but it does use more memory while indexing. This may not be a problem in most cases, however.
We should consider changing the default to Saxon, while keeping VTD-XML available for those who want it. If we decide to do this, we should be careful about breaking backwards compatibility.
One solution would be to version .blf.yaml files. e.g. if the file starts with
version: 2
# What element starts a new document?
documentPath: //document
...
it automatically defaults to Saxon instead of VTD-XML. We should clearly document the change as well, of course.
Some older (and, dare I say, janky) features could be deprecated if Saxon's better XPath support obviates the need for them.
Multiple values are now supported, see #393 and #394. Using processing steps on annotations or standoffAnnotations produces an error. Those can likely be done in XPath 3, so therefore wouldn't need a special feature anymore. We still need to test this more before thinking of switching the default parser though.