dna-seq-varlociraptor Adjust get_read_group for multi sample config.

Adjust get_read_group for multi sample config.

Open christopher-schroeder opened this issue 4 years ago • 3 comments

I have projects where I have to use the same sample in multiple groups. For example I have a lot of single case samples, but the parents are sequenced chunkwise in pools. In that case I write a config which looks like this:

CSDN21	index	CSDN21	ILLUMINA	NA
CSDN47	motherpool	CSDN21	ILLUMINA	NA
CSDN52	fatherpool	CSDN21	ILLUMINA	NA
CSDN22	index	CSDN22	ILLUMINA	NA
CSDN47	motherpool	CSDN22	ILLUMINA	NA
CSDN52	fatherpool	CSDN22	ILLUMINA	NA
CSDN23	index	CSDN23	ILLUMINA	NA
CSDN47	motherpool	CSDN23	ILLUMINA	NA
CSDN52	fatherpool	CSDN23	ILLUMINA	NA

This seems to work just fine for the calling, but for the mapping we have to slightly modify the read_group string generation.

Nov 20 '20 13:11 christopher-schroeder

Yes, I've also thought about a comma separated list. But it might be that a single sample might have a different role for different groups. A comma separated list would not be enough in this case, you would also need a comma separated alias list. ... I dont know, i dont know ...

Nov 25 '20 16:11 christopher-schroeder

Yes, I've also thought about a comma separated list. But it might be that a single sample might have a different role for different groups. A comma separated list would not be enough in this case, you would also need a comma separated alias list. ... I dont know, i dont know ...

that's a very good point.

Dec 15 '20 14:12 johanneskoester

What if we instead add another file groups.tsv for group assignment (while removing the alias and group column from samples.tsv)?

group	sample_name	alias
CSDN21	CSDN21	index
CSDN21	CSDN47	motherpool
CSDN21	CSDN52	fatherpool
CSDN22	CSDN22	index
CSDN22	CSDN47	motherpool
CSDN22	CSDN52	fatherpool

I think that would better capture the relational nature of such constructs, and maybe also be cleaner, because the tables become less crowded and redundant.

Dec 15 '20 14:12 johanneskoester

dna-seq-varlociraptor dna-seq-varlociraptor copied to clipboard

Adjust get_read_group for multi sample config.

dna-seq-varlociraptor
dna-seq-varlociraptor copied to clipboard