rosetta icon indicating copy to clipboard operation
rosetta copied to clipboard

handle data lines with less than expected number of elements

Open aleksandervines opened this issue 8 years ago • 3 comments

The CTD software SD200W does not always produce standard/compliant csv files - well, the main problem with csv, I guess, is that there is no real standard to follow.

The files that are produced sometimes has no data other than the pressure for some measurements, and it does not say N/A or alike for the data which is missing.

Example:

Press;Sal.;Temp;Ox %;mg/l;Density;S. vel.;
0,10
0,20
0,30;32,21;8,651;370,23;34,98;24,989;1481,47;
0,40;32,21;8,655;370,23;34,98;24,987;1481,49;
0,50;32,21;8,659;370,23;34,98;24,986;1481,50;
0,60;32,20;8,663;370,23;34,97;24,984;1481,51;
0,70;32,20;8,667;370,23;34,97;24,983;1481,53;
0,80;32,20;8,671;370,23;34,97;24,981;1481,54;

The first two data lines in this file causes an error since they got one element.

I'd suggest add an option to ignore lines which does not comply to number of elements, or assume the first elements are in the correct place, and add nan for the rest? I don't know which would be preferred in this case.

aleksandervines avatar May 24 '16 12:05 aleksandervines

I think the best option is to keep it an option. Currently, if you mark those two lines as "header" lines in the wizard interface, they should be ignored. We should also give the option of including them, but padding them out with missing values where needed.

lesserwhirls avatar May 30 '16 13:05 lesserwhirls

Currently, if you mark those two lines as "header" lines in the wizard interface, they should be ignored.

This works for each specific case, but it requires for this to be checked on every file. I wouldn't put it past this silly program which creates these csv files to also have some lines like that in the middle or at the end of the file.

And yes, I think it should be an option, as different users would require different solutions, some would want for this to fail as they then would need to quality check the input files which should be on a correct format.

aleksandervines avatar May 31 '16 11:05 aleksandervines

Ah yes, you would need to do that for each file...and, as you say, if those lines happen to be deep in the data block, then all bets are off. A checkbox, enabled by default, called something like "Insert missing values into incomplete data rows" should do the trick. What do you think? Does that text make sense?

lesserwhirls avatar May 31 '16 20:05 lesserwhirls