Stephen Merity
Stephen Merity
cc-warc-examples
CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop
cs205_ga
How deep does Google Analytics go? Efficiently tackling Common Crawl using AWS & MapReduce
gzipstream
gzipstream allows Python to process multi-part gzip files from a streaming source
keras_qa
Keras solution to the bAbI tasks using recurrent neural networks - merged as an example into Keras mainline
keras_snli
Simple Keras model that tackles the Stanford Natural Language Inference (SNLI) corpus using summation and/or recurrent neural networks
montelight-cpp
Faster raytracing through importance sampling, rejection sampling, and variance reduction
pubcrawl
*Deprecated* A short and sweet Python web crawler using Redis as the process queue, seen set and Memcache style rate limiter for robots.txt