Jimmy Lin

Results 211 issues of Jimmy Lin

Would be nice to demonstrate the generality of our platform. Couple obvious ideas come to mind: + Generic doc store "source" interface, swappable between Solr and ES. + Generic graph...

Currently, our schema for relations and facts looks something like this: There's an asymmetry here, as relations are reified with an explicit relation node. We should refactor to make more...

from @r-clancy The ecosystem of Python-based NLP tools is much greater than what's available on the JVM - we want to look into changing DSTLR to be written in Python...

batch_search for RM3 doesn't appear to work, as reported in https://github.com/castorini/pyserini/issues/831 I tracked down the issue to the fact that analyzers have state, and so concurrent calls to analyze the...

Noted by @dragomirradev The "year" column is based on the earliest year in the citation count histogram, which in fact is *not* the earliest year in terms of publications. For...

Inspired by the recent UDF on image processing in Warcbase. If we had a UDF for computing the MD5 checksum of arbitrary data, we could apply to all images and...

feature

Check out this neat trick: http://stackoverflow.com/questions/8499633/how-to-display-base64-images-in-html This means that, if we go with my suggestion in #177 we could do something like `Base64Encode(r.getContentBytes)`, and then use the HTML inject trick...

feature

Google Cloud Dataproc supports Spark: http://googlecloudplatform.blogspot.com/2015/09/Google-Cloud-Dataproc-Making-Spark-and-Hadoop-Easier-Faster-and-Cheaper.html Might want to give it a try...

Google Cloud Bigtable uses the HBase API, which means that it should work with Warcbase: https://cloud.google.com/bigtable/docs/bigtable-and-hbase Might want to give it a try...