modular-file-renderer icon indicating copy to clipboard operation
modular-file-renderer copied to clipboard

[SVCS-531] Improve tabular renderer to handle more TSV cases

Open TomBaxter opened this issue 7 years ago • 5 comments

Ticket

SVCS-531

Purpose

[SVCS-531] Improve tabular renderer to handle more TSV cases

Changes

mfr/extensions/tabular/libs/init.py mfr/extensions/tabular/libs/stdlib_tools.py mfr/extensions/tabular/settings.py

  • separate tsv_stdlib from csv_stdlib
  • improve exceptions

tests/extensions/tabular/files/invalid_null.csv tests/extensions/tabular/test_stdlib_tools.py

  • add tests

tests/extensions/ipynb/files/no_metadata.ipynb - quiet test (incidental)

tests/extensions/zip/test_renderer.py - quiet test (incidental)

Side effects

Possible increase in system resource use due to full file being sniffed

QA Notes

In addition to the tsv files found in https://osf.io/pexrv/?view_only=af771fef666049a9aefa85642230e80a also test json_test.csv attached to ticket. This ticket only effects .tsv and .csv files rendered in MFR.

Deployment Notes

None

TomBaxter avatar Dec 21 '17 19:12 TomBaxter

Coverage Status

Coverage increased (+0.5%) to 68.501% when pulling 3048f77d3f319b5f93e688c53e7a60aa4895e337 on TomBaxter:feature/SVCS-531 into 8bb2dd4d35f8c1d89a72fc8b9e09db317f3497f3 on CenterForOpenScience:develop.

coveralls avatar Dec 21 '17 19:12 coveralls

@felliott The previous PR was already Phase 2. Do you want to review this or should I do another round of phase 1?

cslzchen avatar Jan 05 '18 19:01 cslzchen

I'll take it. Phase 2 is good.

felliott avatar Jan 08 '18 14:01 felliott

Sounds good! Added Phase 2 label to the PR.

cslzchen avatar Jan 08 '18 14:01 cslzchen

hey guys,

i think you may have some issues with this:

dialect = csv.Sniffer().sniff(fp.read(), ',')

https://github.com/CenterForOpenScience/modular-file-renderer/pull/308/files#diff-875eb910328d4d6758865a7db8ef38cdR13

i remember having issues with the jamovi csv importer when the data returned didn't end with a complete line. i worked around it thus:

some_data = csvfile.read(4096)
if len(some_data) == 4096:  # csv sniffer doesn't like partial lines
    some_data = trim_after_last_newline(some_data)
    dialect = csv.Sniffer().sniff(some_data, ', \t;')

and

def trim_after_last_newline(text):

    index = text.rfind('\r\n')
    if index == -1:
        index = text.rfind('\n')
        if index == -1:
            index = text.rfind('\r')

    if index != -1:
        text = text[:index]

return text

https://github.com/jamovi/jamovi/blob/master/server/jamovi/server/formatio/csv.py#L85

i'd also take exception to this:

# CSVs are always values seperated by commas

https://github.com/CenterForOpenScience/modular-file-renderer/pull/308/files#diff-875eb910328d4d6758865a7db8ef38cdR11

i don't think excel saves .csv files separated by commas by default

jonathon-love avatar Jan 25 '18 07:01 jonathon-love