galaxy
galaxy copied to clipboard
List creation misses remove file extension option
The checkbox: Remove extensions ... would be nice to have for creating lists. It already exists for paired list creation. So would also be nice for consistency.
While working on the rewrite to Vue, I attempted to start working on this issue. However, there are a few problems with doing this that took this out of scope of the rewrite.
- Not all datasets have extensions.
- While there is a field for file type/extension on the object, there is no feasible way to keep track of which elements had the extension name on the object and which did not.
- This not only applies to the ListCollectionCreator, but also to list of pairs, since, right now, the extension is only being removed from the name of the pair and not the datasets themselves.
- Either when toggling remove extensions back on, we add the extensions back on to ALL the datasets or we need to think of a creative (non-wasteful) way of keeping track of which elements had the extensions in their names already and which did not.
How is the extension removal working at the moment? Just splitting at the last .? One could change this to remove the longest common suffix. Then the change would be consistent for all data sets and one does not need to store which elements had extensions and which not. Would also solve the problem that currently only .gz is removed from .fastq.gz files.
Right now, list of pairs only removes the file extension from the pair name. It gets that file extension from the pairs themselves. Since pairs are renamed forward/reverse once you click create, it doesn't matter if the actual files have extensions or not.
But yes, right now it removes the extension from the last period, which is a problem for .fastq.gz files.
However, removing the longest extension seems overkill. It could make the name of some files indecipherable. For example, a file name called "aFileBob_1" with no extension would be renamed to "a" to get rid of the 9 characters to remove the whole .fastq.gz extension.
There is definitely a way to resolve this. I just wanted to capture my quick thoughts from the rewrite so that when I come back to this later, I remember what I've already tried.
Good point, but having > 1 files in a collection seems a reasonable assumption, then the longest common suffix should work.
One could also think of a text input where users can specify what to be removed. The default could be the longest common suffix (or everything up to the last . in case of only one file in thecollection)
Still a special treatment of 1 file would
But as you mentioned a special case for file names that do not contain a dot would be good.