Arne Köhn

Results 82 comments of Arne Köhn

I really like it, especially the speed! There is a `display:none` span containing the text "bib" in the bibtex block inside the `acl-paper-link-block` block. When using a text browser, this...

Great! We need plaintext abstracts for the papers that have no abstract in the XML file. For example, I could provide you with a list of pdf urls that need...

To re-create this file, use this command in the `data/xml` directory: xmlstarlet sel -t -m '//paper[not(abstract)]' -v $'concat(url, "\n")' *xml | sed '/http/! s|\(.*\)|http://www.aclweb.org/anthology/\1.pdf|' > no-abstract.txt These are about 40k...

Let's also adjust the schema to catch this kind of mistakes in the future.

No, but those are valid permanent identifiers.  Maybe we should treat the field that way instead of only supporting doi.

@mjpost : Approve means that you would like to keep the XML data and not the PDF one, correct? > For example, the same corrective principle might change Koehn →...

Short cross link: https://github.com/acl-org/acl-anthology/issues/295#issuecomment-494909877 for a discussion of how to mirror PDFs in bulk. Should be ~5mins to implement.

Can we discuss that further in #295 (the mirroring issue)? I can write the script & create a pull request later today; I am currently on a train with limited...

Do we want to recheck here? All files that are referenced in the XML are correct as of now (see #598). This issue might still have some PDFs that should...

I think the most work is proper versioning and releases. Right now code and data are automatically synchronized because they are in the same repository, but we cannot guarantee that...