csvkit icon indicating copy to clipboard operation
csvkit copied to clipboard

in2csv: multiple issues in 1.0.6

Open EML-github opened this issue 3 years ago • 2 comments

xls[x] conversion in 1.0.6 has changed fundamentally from a previous version, which I believe is 0.9.1 (installed ~2015, no '-V' switch). There are multiple issues, but the most obvious ones are that the conversion of times (format [m]:ss.00) has broken, and integer values are now (sometimes?) converted to floats. Is there any point supplying a test case? Is in2csv actively maintained? In my case, the csv output is parsed by downstream C++ code, and the parsing requirements have changed so much that I'm now looking at (Linux) command-line conversion with ssconvert or LibreOffice/calc.

Obvious issues are:

1 - '1' being converted as '1.0'

2 - [m]:ss.00 times are generally converted as hh:mm:ss.000000. I could live with this, but it's inconsistent; sometimes the '.000000' is omitted entirely. I could also live with conversion to floating-point, but this is inconsistent, and sometimes ambiguous (is '5.1' converted from 5.1 seconds, or 122.4 hours??)

3 - the last column of a row of times is converted differently; however, '--no-inference' fixes this

4 - '--datetime-format' appears to do nothing. It's difficult to be sure without documentation, but '--datetime-format "%M:%S.%f"' makes no difference on my test spreadsheets, and does not produce mm:ss.00

EML-github avatar Apr 28 '22 13:04 EML-github

Are you sure that this is a limitation of in2csv? Excel shows a representation of the data, which sometimes do not match with what is actually stored. Have you tried to convert your Excel spreadsheet to CSV to check if you get the same inconsistencies?

frafra avatar May 26 '22 16:05 frafra

1.0 is essentially a frontend to the agate Python package, to which most of the logic has moved. In 0.9, many things were much less robust and/or opinionated. For data representation specifically, 0.9 was more aggressive in casting values as integers, even if the underlying XLS(X) data was a float or decimal. The reason we have a change in the major number is precisely because there are breaking changes like these.

Having said that, a difference across major versions is not a bug.

the conversion of times (format [m]:ss.00) has broken

Do you mean the inconsistency of '.000000' appearing? Can you provide input files that reproduce the different outputs?

the last column of a row of times is converted differently

I haven't seen this myself. I would need an input file with which to reproduce the behavior.

'--datetime-format' appears to do nothing

This is only relevant for CSV files. Excel does not store formatted times (it stores times and formats separately).

jpmckinney avatar Sep 06 '22 17:09 jpmckinney

Closing as no response.

jpmckinney avatar Dec 20 '22 21:12 jpmckinney