gnfinder icon indicating copy to clipboard operation
gnfinder copied to clipboard

Add an option that would relax token formation

Open dimus opened this issue 1 year ago • 0 comments

Some documents contain no space between a name and one of the following characters: []{},.<>. It makes sense to add an option that would recognize such characters as a token separator.

Additional thing that happens quite often are names like <i>Aus bus</i> Linn. It would be good to ignore <i> and </i>, or even use them as indicators of a canonical form of scientific names.

See also

https://github.com/gnames/gnfinder/issues/150

https://github.com/gnames/gnfinder/issues/53

dimus avatar Jan 11 '24 12:01 dimus