bazaar
bazaar copied to clipboard
trafficstars
Bazaar
A collection of tools to generate input for DeepDive.
Parser
Parser is a wrapper of Stanford CoreNLP which takes a simple JSON format as input and generates a TSV file that can be directly loaded into a database.
There are five different ways in which the parser package is used.
parser/run.shruns the parser as a single process.parser/run_parallel.shruns multiple instances of the parser on a single machine.- Distribute runs multiple instances of the parser on multiple machines.
- Condor contains instructions on how to run the parser on the Condor cluster.
parser/run.sh -p 8080runs the parser as a REST service.
XML
Many external datasets are in an XML format. To consume these datasets with DeepDive, the XML has to be parsed into the simple JSON representation that the Parser package uses as input.
An example of using an XML parser is contained in the dd-genomics project.
Distribute
It is often desirable to run the parser on multiple machines on ec-2 or azure. Distribute contains tools to automatically provision machines, distribute data, perform parsing, and collect results.