tools-devteam
tools-devteam copied to clipboard
Enhance "Convert Characters" to include option to ignore chars in quotes
Some common CSV formatted files have content in double quotes. This can include commas. Conversion to tabular format is then difficult.
Primary request to add in a option on the tool form to ignore characters bound by double quotes. This has limited but specific utility.
Another option is expand the tool to convert any of the characters to any of the other characters, instead of limiting this to just the list of chars to tabs (only). If this was done, multiple passes through the tool could eventually parse any cvs to tab. This enhancement of tool behavior is desirable outside of this particular request (fixing and converting other formats would be possible within Galaxy).
And yet another option is to wrap a tool like OpenCSV http://opencsv.sourceforge.net/ to convert csv to tabular specifically.
Example of current behavior:
original file:

after converting commas to tabs, this is the result (expected with current rules):

CSV can be complex. I think writing a specific tool make sense, maybe based on pandas CSV reader?
I agree, a specific tool is the most helpful for users. And I like the idea of adding in a py lib rather than an entirely new tool package. Could be useful in other cases (functions built into tools, etc).
If we go that way, I'd still like to see the convert tool have the expanded option to convert any of the chars handled by the tool to any of the other chars (instead of only moving to tabs). I run into wanting this functionality every so often - generally when correcting poor formats. So, maybe we consider both?
Thanks @bgruening !
@jennaj can you send me a bunch of input files that you want to have converted?
Yes, later tonight or tomorrow. thanks! Assigned me so that I remember where this is :)
Up