anserini
anserini copied to clipboard
Add prebuilt index for ACL Anthology
Lucene inverted BM25 index for ACL Anthology. To make sure all fields are indexed without getting flagged as 'empty', empty abstracts and authors have 'n/a' as placeholders. Fields are title, contents (abstract), authors (name, variations separated by commas; institution after name after semicolon. authors separated by periods), venue (name of conference/journal), and year. IDs are the 'bibkey', guaranteed to be unique across the Anthology.