miller icon indicating copy to clipboard operation
miller copied to clipboard

CSV header/data length mismatch 5 != 3 on row that does not exist

Open vera-rykalina opened this issue 1 year ago • 2 comments

Hi! I am joining 4 csv tables with the same number of rows. A mlr command is implemented in a Nextflow process.

Command:

script: """ mlr
--csv join
-u
--ul
--ur
-j SequenceName
-f ${stanford} ${comet} |
mlr --csv join -u --ul --ur -j SequenceName -f ${g2p} |
mlr --csv join -u --ul --ur -j SequenceName -f ${rega} > joint_${comet.getSimpleName().split('comet_')[1]}.csv """ comet, stanford, rega, and g2p are my csv tables.

The join was working without any problem last week, but since today I have been having this errow:

CSV header/data length mismatch 5 != 3 at filename (stdin) row 1176.

The thing is that 1176 row does not exist in any of my csv tables. All my tables have 3 columns and 1175 rows each.

Any idea what is going on here?

Thanks, Vera

vera-rykalina avatar Feb 15 '24 17:02 vera-rykalina

@vera-rykalina is it possible for you to share your data files, e.g. at gist.github.com?

johnkerl avatar Feb 15 '24 18:02 johnkerl

Also, I suspect that the output of mlr --csv join -u --ul --ur -j SequenceName -f ${g2p} is intermediate data which does have 1176 rows (which can happen if there is a duplicate value of SequenceName) ...

johnkerl avatar Feb 15 '24 18:02 johnkerl

Closing as I believe this is resolved -- if this is in error please re-open and I'm happy to discuss further -- thank you!

johnkerl avatar Jun 08 '24 17:06 johnkerl