Include file grouping information in fqstats

Open unode opened this issue 7 years ago • 0 comments

Using only information contained in a fqstats file it is currently impossible to distinguish between processing pair.1, pair.2 and singles using pairing information paired(..., singles=...) versus treating each file independently fastq(...).

Adding file grouping information could alleviate this issue. Example:

                              SAMPLE
        0:file           pair.1.fq.gz
    0:encoding     Sanger (33 offset)
     0:numSeqs                 737216
0:numBasepairs               73654175
   0:minSeqLen                     50
   0:maxSeqLen                    101
   0:gcContent       0.41184101240696
   0:filegroup                      0   <---
        1:file           pair.2.fq.gz
           ...                    ...
   1:filegroup                      0   <---
        2:file          singles.fq.gz
           ...                    ...
   2:filegroup                      0   <--- all above = paired(..., singles=...)
        3:file processed.pair.1.fq.gz
           ...                    ...
   3:filegroup                      1   <--- new group
           ...                    ...

A similar situation is seen when using load_mocat_sample(...) on a folder that includes multiple pairs/lanes. Here, a variable number of inputs makes parsing the stats file non-trivial.

In this case, and related to https://github.com/ngless-toolkit/ngless/issues/55#issuecomment-358085413 we could treat all the inputs of a sample as the same filegroup.

                              SAMPLE
        0:file  SAMPLE/pairA.1.fq.gz
           ...                   ...
   0:filegroup                     0   <---
        1:file  SAMPLE/pairA.2.fq.gz
           ...                   ...
   1:filegroup                     0   <---
        2:file SAMPLE/singlesA.fq.gz
           ...                   ...
   2:filegroup                     0   <---
        3:file  SAMPLE/pairB.1.fq.gz
           ...                   ...
   3:filegroup                     0   <---
        4:file  SAMPLE/pairB.2.fq.gz
           ...                   ...
   4:filegroup                     0   <---
        5:file SAMPLE/singlesB.fq.gz
           ...                   ...
   5:filegroup                     0   <---
           ...                   ...

Jun 19 '18 20:06 unode