label-studio-converter icon indicating copy to clipboard operation
label-studio-converter copied to clipboard

CSV output should use simple field values for labels (not convoluted JSON)

Open tomasohara opened this issue 4 years ago • 4 comments

Unfortunately, the CSV output exported by Label Studio uses JSON for the label . See dog-example-project-35-at-2021-10-07-18-48-22cb3c67.csv. This makes it hard to review the data in spreadsheets,.

Instead, the label should be extracted as a simple string value, as with the other converters (e.g., CONLL). In addition, each annotation should be on a separate line. For example, 15 distinct annotations are packed into a single line in the above example!

For the expected output see the attached desired-dog-example-project-35-at-2021-10-07-18-49-22cb3c67.csv.

Note that this is not a feature request: I was baffled when I found out about this behavior. For example, why bother having a CSV format if the important part must be processed with a JSON utility?!

tomasohara avatar Oct 07 '21 23:10 tomasohara

@tomasohara Originally we implemented a CSV export for Choices, not for NER labels. The CSV with labels is produced automatically without any preprocessing (despite to Choices). Yes, maybe it's better to disable export altogether for everything that is not Choices. Or we should make a preprocessing for labels too.

makseq avatar Oct 08 '21 17:10 makseq

OK, thanks for the clarification. The changes are minimal, as shown in the following comparison of the existing convert_to_csv vs. my convert_to_flat_csv: _convert_to_csv_flat-diff-8oct21.

Here's the original and revised functions: _convert_to_csv.txt and _convert_to_csv_flat.txt.

Should I make a push request? I would implement both in the same function with the new behavior governed by an environment variable (e.g., FLATTENED_CSV_ANNOTATIONS).

tomasohara avatar Oct 08 '21 20:10 tomasohara

Sorry, I closed it by accident when adding the diff listing. Therefore, I re-opened it.

tomasohara avatar Oct 08 '21 20:10 tomasohara

Yep, pull request would be great!

makseq avatar Oct 08 '21 21:10 makseq