dig-etl-engine issues

Develop module to label entity tables

In the first phase we only need a binary classifier for entity tables

enhancement

Classify tables

Develop the table classifier. It should read the sample vectors as a resource so that they can be read at startup of the pipeline, and then it should tag each...

szeke

enhancement

Compute embeddings

The embedding code will read from a kafka topic to compute the embeddings. It will read as many docs as it needs from the topic and then write the embedding...

szeke

enhancement

Add option to put ETK output in a kafka topic

We need a way to compute word embeddings for the ETK output. Having the ETK output go to a kafka topic would enable implementing the embedding as a consumer of...

szeke

enhancement

Write documentation for adding a custom extractor to a project

The documentation should include a pointer to an example class that implements a custom extractor

szeke

documentation

Create tab in myDIG GUI to show list of custom extractors

The tab should list all the custom extractors with the following columns: - name: as defined by the method to get the name of the extractor, from code or annotations,...

szeke

enhancement

Create example custom extractor for a project

Create example for end-to-end testing, in the back end and GUI. Should be included somewhere GitHub so that it can be used as documentation.

szeke

enhancement

Add ability to import a zip file of HTML documents

The import should automatically create doc-ids and URLs using the names of the files. If the name of the file parses as a URL, it should be used as a...

szeke

enhancement

dig-etl-engine
dig-etl-engine copied to clipboard

Metadata

Develop module to label entity tables

Classify tables

Compute embeddings

Add option to put ETK output in a kafka topic

Write documentation for adding a custom extractor to a project

Create tab in myDIG GUI to show list of custom extractors

Create example custom extractor for a project

Add ability to import a zip file of HTML documents

← Metadata

Owner

Metadata

dig-etl-engine dig-etl-engine copied to clipboard

Metadata

← Metadata

Owner

Metadata

dig-etl-engine
dig-etl-engine copied to clipboard