Best practices for creating a WGS pooled reference
Following the readthedocs for using autobin, coverage, and reference for making a pooled reference, I run into the issue that different samples have different coverage bins.
This is expected as each WGS sample has slightly different coverage profiles. Targets were generated from autobin using the following parameters:
cnvkit.py autobin -f hg38.fa -m wgs -b 50000 -g access-10kb.hg38.bed --annotate refFlate_hg38.txt *.cram
A minor issue with the guide is that using *.cram would only process the first cram in the list, so instead I would run this sequentially over each cram.
This is followed with cnvkit.py coverage 1.cram 1.targets.bed -f hg38.fa -o 1.targets.cnn for crams 1-n and cnvkit.py reference *.targets.cnn -f hg38.fa -o unaffected.reference.cnn
Since cnvkit.py reference requires the same bins for each sample, would it instead be better to skip autobin and calculate coverage for each unaffected sample using the access-10kb bed file?