Marcel Bollmann
Marcel Bollmann
> Another question of is-it-the-data-or-the-site: > > SIGMORPHON has workshops listed [through 2014](http://aclweb.org/anthology/sigs/sigmorphon/), but one of the 2014 ones is really 2016: W16-20. On top, [their 2018 workshop](http://aclweb.org/anthology/events/ws-2018/#w18-58) isn't listed...
True, the association of volumes to SIGs works quite similarly to `joint.yaml`, so the information from those YAML files could be merged into the XML as well.
> I like the extended XML and even removing joint.xml, but I don’t like the idea of WS as a pseudo venue. Why not add a type field to venues.yaml...
Thanks for your thoughts, Matt! I've been shying away from the refactoring effort this might take, but it probably is the right way forward, as it might also allow us...
Just a note that if we're going to refactor this as described by Matt, I'd also be in favor of prioritizing this now. Lately I've been thinking about both the...
**Here's a [file with automatically extracted abstracts](https://marcel.bollmann.me/misc/acl_arc_abstracts.txt).** I thought I'd try extracting abstracts from the ACL Anthology Reference Corpus. Concretely, I used the [March 2016 version of the ParsCit XML](https://acl-arc.comp.nus.edu.sg/archives/acl-arc-160301-parscit/)...
I don't; maybe @knmnyn knows? I briefly tried Tika on a couple of cases that my extraction process got wrong, and it handled them better. Maybe we could combine pipelines...
First of all, thanks for offering your help @abhinavkashyap! That SciWING pipeline looks really cool. I've clicked through a few of the abstracts and observed that several of those looked...
@abhinavkashyap, do you just run the PDFs through SciWing to get the abstracts or is there more pre-/post-processing involved? I'm asking because I have a simple pipeline now of manually...
I can look at the abstracts sometime later this week, and will also compare them to what my simple pipeline produces. For dealing with hyphenation, I currently have a simply...