OpenScraper icon indicating copy to clipboard operation
OpenScraper copied to clipboard

Fix "TSV" generation

Open CBalsier opened this issue 5 years ago • 0 comments

When exporting data by clicking on "download the set as a .tsv file", the resulting file presents shifted columns when data is not present. See for example in the following screenshots : export from the dataset Apriles-ODAS and "preview" from OpenScraper:

  • in the preview, there is no "date" and no "données économiques", the columns are blank
  • in the TSV,
    • the "date" contains "partenaires" (the column right after date), one shift to the left,
    • the "partenaires" contains "résumé" (if there is no "données économiques") or "données économiques", one or two shifts to the left
    • "données économiques" contains "tags" or "résumé", one or two shifts since no other empty column is between them
    • "résumé" contains "tags" or "website" (which, in fact, also contains e-mails…)
    • "tags" contains "website" or "adresse"

Screenshot from 2019-05-03 17-27-56 Screenshot from 2019-05-03 17-28-24

Suggested fix : a little bit like in https://github.com/entrepreneur-interet-general/OpenScraper/blob/c96c6d85a3e54b90d4f81e06541ab619d8f149f2/openscraper/controller.py#L1362 replace line 1367-1368

if id_field in item.keys() :
    item_list.append( " ".join(item[ id_field ]) )

by

if id_field in item.keys() :
    item_list.append( " ".join(item[ id_field ]) )
else :
    item_list.append("")

to ensure empty values are correctly written at line 1369 in the resulting file.

CBalsier avatar May 03 '19 15:05 CBalsier