dig-etl-engine
dig-etl-engine copied to clipboard
Download DIG to run on your laptop or server.
In the first phase we only need a binary classifier for entity tables
Develop the table classifier. It should read the sample vectors as a resource so that they can be read at startup of the pipeline, and then it should tag each...
The embedding code will read from a kafka topic to compute the embeddings. It will read as many docs as it needs from the topic and then write the embedding...
We need a way to compute word embeddings for the ETK output. Having the ETK output go to a kafka topic would enable implementing the embedding as a consumer of...
The documentation should include a pointer to an example class that implements a custom extractor
The tab should list all the custom extractors with the following columns: - name: as defined by the method to get the name of the extractor, from code or annotations,...
Create example for end-to-end testing, in the back end and GUI. Should be included somewhere GitHub so that it can be used as documentation.
The import should automatically create doc-ids and URLs using the names of the files. If the name of the file parses as a URL, it should be used as a...