galaxy icon indicating copy to clipboard operation
galaxy copied to clipboard

Enhance relabel_from_file to work with any column pair in mapping file

Open wm75 opened this issue 1 year ago • 1 comments

License

  • [x] I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

This can save a cut step in WFs. Will provide tests if you think it's useful.

wm75 avatar Oct 17 '24 22:10 wm75

I don't use Galaxy or follow the latest trends in tool design - so feel free to just dismiss this suggestion if it doesn't reflect how tools are actually used. My gut is though I would implement this as changing:

                <option value="txt">Using lines in a simple text file.</option>
                <option value="tabular">Map original identifiers to new ones using a two column table.</option>

to:

                <option value="txt">Using lines in a simple text file.</option>
                <option value="tabular">Map original identifiers to new ones using the first two columns of a simple table.</option>
                <option value="tabular_configured">Map original identifiers to new ones using arbitrary columns from a table.</option>

And then adding more advanced option in that new selection when - including like comment line skipping and table type options (stuff like break on commas instead of tabs). But that might be overkill.

Either way though - this seems solid and I'm a +1 on it after it has tests and the linting issue is fixed.

jmchilton avatar Oct 18 '24 15:10 jmchilton

And then adding more advanced option in that new selection when - including like comment line skipping and table type options (stuff like break on commas instead of tabs). But that might be overkill.

I hadn't considered such advanced options because, so far, I've only encountered the column issue during WF development. Now your comment made me think of https://github.com/galaxyproject/iwc/tree/main/workflows/data-fetching/sra-manifest-to-concatenated-fastqs. @lldelisle I seem to remember there was an issue with that one initially about the SRA run data file being csv instead of tab-separated? I think the current patch would already allow simplification of the current WF, but maybe check if there is a missed opportunity for even more here.

wm75 avatar Oct 21 '24 12:10 wm75

Hi everyone, Not sure this would help in this workflow as I need to create a new column anyway... But this is generally speaking a really good idea.

lldelisle avatar Oct 21 '24 12:10 lldelisle

So going to try and get this mergeable today for 24.2, but coming back to the versioning question: is adding the updated version as a copy like in #18385 the way to go here instead of updating the tool in place?

wm75 avatar Nov 11 '24 09:11 wm75

So going to try and get this mergeable today for 24.2, but coming back to the versioning question: is adding the updated version as a copy like in #18385 the way to go here instead of updating the tool in place?

Yes, this is currently the way for breaking changes.

nsoranzo avatar Nov 11 '24 09:11 nsoranzo

I don't use Galaxy or follow the latest trends in tool design - so feel free to just dismiss this suggestion if it doesn't reflect how tools are actually used. My gut is though I would implement this as changing:

                <option value="txt">Using lines in a simple text file.</option>
                <option value="tabular">Map original identifiers to new ones using a two column table.</option>

to:

                <option value="txt">Using lines in a simple text file.</option>
                <option value="tabular">Map original identifiers to new ones using the first two columns of a simple table.</option>
                <option value="tabular_configured">Map original identifiers to new ones using arbitrary columns from a table.</option>

So if I go with this part of @jmchilton's suggestion (i.e. hide the new behavior behind a new third select option), that would qualify for a WORKFLOW_SAFE_TOOL_VERSION_UPDATE and not require a new file then, correct? Because all old workflows could then also run with the new tool version? (If the python code handles all three options correctly, of course)

wm75 avatar Nov 11 '24 15:11 wm75

So, I rewrote this to remain compatibility with the old version of the tool and WFs written with the old version can now be executed without issues also when only the new tool version is installed.

@jmchilton I considered your fancier configuration ideas, but did not implement them here because:

  • specifying header lines to ignore is covered by non-strict mode (although explicit header handling would still be a bit stricter than that)
  • different column separators could be interesting for some use cases, but most other separators are also valid chars in element identifiers, making support for them a questionable general concept. Users who know what they are doing can still run the Cut tool first, which has support for all kinds of input separators and will produce tab-separated output.

wm75 avatar Nov 12 '24 11:11 wm75

This PR was merged without a "kind/" label, please correct.

github-actions[bot] avatar Nov 12 '24 15:11 github-actions[bot]