Marcel Bollmann

Results 33 issues of Marcel Bollmann

With the addition of preformatted citation strings (#1390), we're now using citeproc to generate a reference string in ACL bibliography format. We'd like to keep citeproc for now in order...

enhancement

In paper titles with multiple sentences, the first word of non-initial sentences should probably be ``d. This happens e.g. here: [How Good is Your Tokenizer? On the Monolingual Performance of...

> I realise I have no power here, but it seems unintuitive for all datasets a paper uses & introduces to be listed on the anthology without distinction. Would a...

enhancement

This thread is intended to collect all feedback, suggestions, bug reports, etc. for the new Anthology website in the [`static-rewrite`](https://github.com/acl-org/acl-anthology/tree/static-rewrite) branch. (Edit: [live demo here at http://aclweb.org/anthology](http://aclweb.org/anthology)) **If you do...

help wanted

Many PACLIC proceedings have URLs in their `` entry in the XML, not DOIs. This fixes that. Technically, the current entries are _Handle_ URLs, not _DOI_ URLs, but from spot-checking...

The [data/yaml/joint.yaml](https://github.com/acl-org/acl-anthology/blob/master/data/yaml/joint.yaml) file is a repeated source of confusion. While @mjpost recently started a [wiki page](https://github.com/acl-org/acl-anthology/wiki/Venues,-Volumes,-and-Events) that (also) describes how it's used, I wonder if we shouldn't refactor this to...

enhancement

**TL;DR:** If name variants are defined in the XML, whether a variant is considered part of the "canonical name" depends on the order in which the XML files are read....

bug

There was some discussion on whether we should make [our `anthology` library](https://github.com/acl-org/acl-anthology/tree/master/bin/anthology) into a PyPi package. This would make it easier for people to use our Python interface to the...

enhancement
triaged

I have crosschecked a full file list from the aclweb.org server (created by @mjpost on 29.03.2019) with what would be expected after parsing the Anthology XML. The result is a...

bug

When computing `logsum_alt`, the frequency of a removed piece is re-assigned to alternatives: https://github.com/google/sentencepiece/blob/ba7e11a17f606327d0652528d58d2dd8cd265c6f/src/unigram_model_trainer.cc#L389-L394 But the code uses `alternatives.size()` which, if I'm not mistaken, is always equal to `sentencepieces.size()`. Don't...

bug