Sylvester Keil
Sylvester Keil
Echoing @a-fent's observations above, in my experience the biggest challenge was the format of the footnotes themselves. Extracting the footnotes from the page is also a challenge (mainly because it's...
Marking up full-text documents is a lot of work, because you need to check each line. The best way to start is use an existing model to create an initial...
@amkelly I'll try to come back to this when I have more time. Just briefly: the .ttx format is clearly the result of my personal workflow: I was working in...
My first guess is that there's a syntax error in the .ttx somewhere. If you can share a file where this happens I can take a quick look.
Can you try this, e.g., in `irb`: ```ruby require 'anystyle' doc = AnyStyle::Document.open './path/to/file.ttx' ``` And then just to see if the file was opened without problems, e.g.: ```ruby doc.pages...
I doubt that a finer-than-line granularity of the finder model would be practical. References in a paper or book make up only a small part of the text. Yet, for...
To get started, I'd save the .ttx output and inspect it. If there were no references found it means that either no reference lines where detected at all or that...
The default model is trained using the 'core' dataset. The 'gold' is used mainly to ensure the quality when we update the model, but in general all references in core...
Our name parser interprets a single word as a given name. For references and western languages using family is arguably the better choice, but it's not an easy call to...
The finder model also deals with sequences and tokens. It's just that a line represents a token and a document a sequence (and a dataset is a set of sequences)....