poreCov icon indicating copy to clipboard operation
poreCov copied to clipboard

Catch identical filenames

Open MarieLataretu opened this issue 3 years ago • 5 comments

I suggest to change simpleName to baseName here: https://github.com/replikation/poreCov/blob/9ba98fe38d666508fa7dd0bd16d4accc5fe36a4b/poreCov.nf#L183 (and potentially somewhere else) to avoid problems with file names with more than one ..

Else or in addition a sanity check for identical file names would be good.


Context: https://www.nextflow.io/docs/latest/script.html#check-file-attributes

MarieLataretu avatar Jan 13 '22 13:01 MarieLataretu

maybe there is a way to just remove the ".fastq.gz" or ".fastq" ? because with basename the .fastq remains in the sample names

replikation avatar Jan 13 '22 14:01 replikation

https://stackoverflow.com/questions/17676562/get-file-extension-for-special-cases-like-tar-gz

replikation avatar Jan 13 '22 14:01 replikation

But then we should also cover .fq, .fq.gz ... on the other hand it's not the worst when the sample names still have the .fq extension but the pipeline still runs through ;) just if we miss some weired file end

hoelzer avatar Jan 13 '22 15:01 hoelzer

because with basename the .fastq remains in the sample names

True, haven't thought about that.


Here a code snippet for the sanity check

Channel
    .from('Hello','Hola','Ciao')
    .tap {all} // to conserve the original channel
    .collect()
    .map{ it -> [it.size(), it.unique().size()]}
    .subscribe onNext: { 
        assert it[0] == it[1]
    }

MarieLataretu avatar Jan 13 '22 15:01 MarieLataretu

ping @DataSpott

replikation avatar Oct 19 '22 11:10 replikation