dstlr icon indicating copy to clipboard operation
dstlr copied to clipboard

Complete end-to-end documentation for single-node dstlr

Open lintool opened this issue 6 years ago • 5 comments

We need complete end-to-end documentation for a single-node dstlr:

  • Ingesting Washington Post into Solr.
  • Running extraction on a subset of the docs. (I understand that extraction over the entire corpus might be unrealistic on a single node.)
  • Running enrichment.
  • Running sample data cleaning queries.

We have parts here and there already, but I'd like documentation down to the level of "copy and paste these commands" into a shell... and it should just work.

lintool avatar Oct 02 '19 22:10 lintool

I've started a branch here for the updated documentation. I've added the instructions to build dstlr, fix an issue with CoreNLP 3.8 and Spark, added the Anserini/Solrini instructions, and updated some neo4j docs.

@x389liu Are you able to flush out more of the Running section? It might be good to point out what needs changing in each of the scripts (e.g., the neo4j password, amount of memory, # executors and # cores, etc.)

ryan-clancy avatar Oct 02 '19 22:10 ryan-clancy

@r-clancy yeah, I'll add more details to that branch.

x389liu avatar Oct 02 '19 22:10 x389liu

@lintool ryan and I have added detailed instructions on running single-node dstlr #26 I think this issue can be closed?

x389liu avatar Oct 09 '19 21:10 x389liu

Talked to @r-clancy. Before we close this issue, we'd like to run dstlr on a single himrod node following these instructions, check if more details are needed.

x389liu avatar Oct 09 '19 21:10 x389liu

Bumping this - @x389liu you should work on this. The Core18 instructions in the README can now just be replaced by the Solrini docs in Anserini.

lintool avatar Feb 11 '20 15:02 lintool