dstlr
dstlr copied to clipboard
Complete end-to-end documentation for single-node dstlr
We need complete end-to-end documentation for a single-node dstlr:
- Ingesting Washington Post into Solr.
- Running extraction on a subset of the docs. (I understand that extraction over the entire corpus might be unrealistic on a single node.)
- Running enrichment.
- Running sample data cleaning queries.
We have parts here and there already, but I'd like documentation down to the level of "copy and paste these commands" into a shell... and it should just work.
I've started a branch here for the updated documentation. I've added the instructions to build dstlr, fix an issue with CoreNLP 3.8 and Spark, added the Anserini/Solrini instructions, and updated some neo4j docs.
@x389liu Are you able to flush out more of the Running section? It might be good to point out what needs changing in each of the scripts (e.g., the neo4j password, amount of memory, # executors and # cores, etc.)
@r-clancy yeah, I'll add more details to that branch.
@lintool ryan and I have added detailed instructions on running single-node dstlr #26 I think this issue can be closed?
Talked to @r-clancy. Before we close this issue, we'd like to run dstlr on a single himrod node following these instructions, check if more details are needed.
Bumping this - @x389liu you should work on this. The Core18 instructions in the README can now just be replaced by the Solrini docs in Anserini.