OpenScraper
OpenScraper copied to clipboard
Fix "TSV" generation
When exporting data by clicking on "download the set as a .tsv file", the resulting file presents shifted columns when data is not present. See for example in the following screenshots : export from the dataset Apriles-ODAS and "preview" from OpenScraper:
- in the preview, there is no "date" and no "données économiques", the columns are blank
- in the TSV,
- the "date" contains "partenaires" (the column right after date), one shift to the left,
- the "partenaires" contains "résumé" (if there is no "données économiques") or "données économiques", one or two shifts to the left
- "données économiques" contains "tags" or "résumé", one or two shifts since no other empty column is between them
- "résumé" contains "tags" or "website" (which, in fact, also contains e-mails…)
- "tags" contains "website" or "adresse"
Suggested fix : a little bit like in https://github.com/entrepreneur-interet-general/OpenScraper/blob/c96c6d85a3e54b90d4f81e06541ab619d8f149f2/openscraper/controller.py#L1362 replace line 1367-1368
if id_field in item.keys() :
item_list.append( " ".join(item[ id_field ]) )
by
if id_field in item.keys() :
item_list.append( " ".join(item[ id_field ]) )
else :
item_list.append("")
to ensure empty values are correctly written at line 1369 in the resulting file.