modular-file-renderer
modular-file-renderer copied to clipboard
[SVCS-531] Improve tabular renderer to handle more TSV cases
Ticket
Purpose
[SVCS-531] Improve tabular renderer to handle more TSV cases
Changes
mfr/extensions/tabular/libs/init.py mfr/extensions/tabular/libs/stdlib_tools.py mfr/extensions/tabular/settings.py
- separate tsv_stdlib from csv_stdlib
- improve exceptions
tests/extensions/tabular/files/invalid_null.csv tests/extensions/tabular/test_stdlib_tools.py
- add tests
tests/extensions/ipynb/files/no_metadata.ipynb - quiet test (incidental)
tests/extensions/zip/test_renderer.py - quiet test (incidental)
Side effects
Possible increase in system resource use due to full file being sniffed
QA Notes
In addition to the tsv files found in https://osf.io/pexrv/?view_only=af771fef666049a9aefa85642230e80a also test json_test.csv attached to ticket. This ticket only effects .tsv and .csv files rendered in MFR.
Deployment Notes
None
Coverage increased (+0.5%) to 68.501% when pulling 3048f77d3f319b5f93e688c53e7a60aa4895e337 on TomBaxter:feature/SVCS-531 into 8bb2dd4d35f8c1d89a72fc8b9e09db317f3497f3 on CenterForOpenScience:develop.
@felliott The previous PR was already Phase 2. Do you want to review this or should I do another round of phase 1?
I'll take it. Phase 2 is good.
Sounds good! Added Phase 2 label to the PR.
hey guys,
i think you may have some issues with this:
dialect = csv.Sniffer().sniff(fp.read(), ',')
https://github.com/CenterForOpenScience/modular-file-renderer/pull/308/files#diff-875eb910328d4d6758865a7db8ef38cdR13
i remember having issues with the jamovi csv importer when the data returned didn't end with a complete line. i worked around it thus:
some_data = csvfile.read(4096)
if len(some_data) == 4096: # csv sniffer doesn't like partial lines
some_data = trim_after_last_newline(some_data)
dialect = csv.Sniffer().sniff(some_data, ', \t;')
and
def trim_after_last_newline(text):
index = text.rfind('\r\n')
if index == -1:
index = text.rfind('\n')
if index == -1:
index = text.rfind('\r')
if index != -1:
text = text[:index]
return text
https://github.com/jamovi/jamovi/blob/master/server/jamovi/server/formatio/csv.py#L85
i'd also take exception to this:
# CSVs are always values seperated by commas
https://github.com/CenterForOpenScience/modular-file-renderer/pull/308/files#diff-875eb910328d4d6758865a7db8ef38cdR11
i don't think excel saves .csv files separated by commas by default