haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Integration with Rubrix

Open nickchomey opened this issue 3 years ago • 1 comments

Annotating documents is an important task. There's the Haystack Annotation tool, but it seems rather basic. Conversely, Rubrix appears to be a tool solely focused on NLP annotation.

Despite Rubrix perhaps not being fit for Q&A annotation, members of both teams acknowledged the large potential for synergies between the two tools in this discussion.

I'm just leaving this as a reminder/placeholder for future discussion and work.

nickchomey avatar Sep 02 '22 03:09 nickchomey

There's also this related issue for creating a Rubrix integration for Obsei. https://github.com/obsei/obsei/issues/194

nickchomey avatar Sep 02 '22 03:09 nickchomey

Using training or evaluation datasets annotated with some other tools should work as long as the datasets follow a standard format. For example, the SQuAD format is common for QA datasets and also for document retrieval datasets (DPR). We have the SQuADData class here in Haystack to simplify loading, manipulating and saving SQuAD datasets. As long as Rubrix or any other annotation tool use that format, there is no extra work needed for integration, so we can close this issue.

julian-risch avatar Mar 14 '23 13:03 julian-risch