gnfinder
gnfinder copied to clipboard
Add an option that would relax token formation
Some documents contain no space between a name and one of the following characters: []{},.<>. It makes sense to add an option that would recognize such characters as a token separator.
Additional thing that happens quite often are names like <i>Aus bus</i> Linn. It would be good to ignore <i> and </i>, or even use them as indicators of a canonical form of scientific names.
See also
https://github.com/gnames/gnfinder/issues/150
https://github.com/gnames/gnfinder/issues/53