Tom Morris
Tom Morris
The current interface that we're using for Clojure is an internal interface, not the public API. We should migrate to the official public API which is documented here: https://clojure.github.io/clojure/javadoc/ ###...
For users with many projects an easy way to narrow down the list would be to provide a search function with filters the list to those with project names, descriptions,...
In investigating #5354, I realized that this is just the tip of the iceberg and there are a large number of places (~100) in the code where a naked `e.printStackTrace()`...
The N-gram tokenizer in Simile Vicino does not emit tokens which are shorter than the given N-gram size (ie N). While this has the nice property of keeping all the...
MARC records, Wikisource metadata, and perhaps other sources of metadata often include various strong identifiers for authors (LCCN, VIAF, Wikidata, ISNI, etc) which should a) be imported and b) used...
### Proposal The OpenLibrary data is a public resource which represents decades of investment by many volunteers and must be preserved. Data dumps are currently only archived on archive.org which...
### Problem This author: https://openlibrary.org/authors/OL9912016A.json was imported from Amazon in Nov 2021 with the obviously conflated name of "Rachel Kushner Suat Ertuzun." While there are thousands upon thousands of conflated...
### Problem When investigating editions records with no publishers for #2119, I noticed cases where the `source_records` lists MARC records which contain publishers in the MARC 260, but it's not...
Currently we have a single, fixed, English-only list of stop words. https://github.com/OpenRefine/OpenRefine/blob/89a9e8d5bd8a97374e6a07bdfd1ff960169a78e2/modules/core/src/main/java/com/google/refine/model/recon/StandardReconConfig.java#L744-L752 It would be desirable to improve this so that the list is some combination of: - visible -...