poreCov
poreCov copied to clipboard
Catch identical filenames
I suggest to change simpleName
to baseName
here:
https://github.com/replikation/poreCov/blob/9ba98fe38d666508fa7dd0bd16d4accc5fe36a4b/poreCov.nf#L183
(and potentially somewhere else) to avoid problems with file names with more than one .
.
Else or in addition a sanity check for identical file names would be good.
Context: https://www.nextflow.io/docs/latest/script.html#check-file-attributes
maybe there is a way to just remove the ".fastq.gz" or ".fastq" ? because with basename the .fastq remains in the sample names
https://stackoverflow.com/questions/17676562/get-file-extension-for-special-cases-like-tar-gz
But then we should also cover .fq, .fq.gz ... on the other hand it's not the worst when the sample names still have the .fq extension but the pipeline still runs through ;) just if we miss some weired file end
because with basename the .fastq remains in the sample names
True, haven't thought about that.
Here a code snippet for the sanity check
Channel
.from('Hello','Hola','Ciao')
.tap {all} // to conserve the original channel
.collect()
.map{ it -> [it.size(), it.unique().size()]}
.subscribe onNext: {
assert it[0] == it[1]
}
ping @DataSpott