somalier icon indicating copy to clipboard operation
somalier copied to clipboard

Somalier relate with ONT WGS data

Open MartinezRuiz-Carlos opened this issue 1 year ago • 2 comments

Hi all,

I am using Somalier on Nanopore WGS bams aligned to the T2T reference on samples from the same families. We know these samples should show relatedness, but Somalier seems to not be able to pick up the family relationships. Here is how I am running Somalier

somalier extract --sites ${SITES_PATH}/sites.chm13v2.T2T.vcf.gz \
                           --fasta ${REF_PATH}/t2t/hs1.fa \
                           "${SAMPLE}.bam"

Followed by

somalier relate --infer         --groups "somalier_groups_in.tsv"         "./*.somalier"
somalier relate --ped "somalier.samples.tsv"         --groups "somalier_groups_in.tsv"         "./*.somalier"

Here is the output I get (somalier.samples.tsv):

#family_id      sample_id       paternal_id     maternal_id     sex     phenotype       original_pedigree_sex   gt_depth_mean   gt_depth_sd     depth_mean      depth_sd        ab_mean ab_std  n_hom_ref       n_het   n_hom_alt       n_unknown       p_middling_ab   X_depth_mean    X_n     X_hom_ref       X_het   X_hom_alt       Y_depth_mean    Y_n
00-004-0539_00-2948_bp  00-004-0539_00-2948_bp  -9      -9      -9      -9      unknown 41.3    7.3     41.3    7.4     0.53    0.39    4121    6745    4813    1705    0.070   21.57   320     157     1       162     22.57   14
00-004-0560_00-3113_nl  00-004-0560_00-3113_nl  -9      -9      -9      -9      unknown 44.4    7.8     44.4    7.8     0.53    0.39    3889    6831    4648    2016    0.077   22.46   307     154     0       153     21.29   14
00-004-0552_00-3065_nl  00-004-0549_00-2915_nl  -9      -9      -9      -9      unknown 45.5    7.9     45.5    7.9     0.53    0.39    3960    6915    4709    1800    0.064   23.24   304     146     0       158     14.75   16
00-004-0552_00-3065_nl  00-004-0569_10-12454_bp -9      -9      -9      -9      unknown 55.3    8.5     55.3    8.5     0.53    0.39    3879    6886    4624    1995    0.042   55.43   315     99      110     106     0.00    0
00-004-0552_00-3065_nl  00-004-0547_00-2916_nl  -9      -9      -9      -9      unknown 46.2    7.9     46.2    7.9     0.52    0.39    4041    6854    4743    1746    0.062   45.40   324     88      137     99      0.00    0
00-004-0552_00-3065_nl  00-004-0552_00-3065_nl  -9      -9      -9      -9      unknown 40.5    7.3     40.4    7.3     0.52    0.39    4069    6811    4801    1703    0.074   40.00   330     90      130     110     0.00    0
00-004-0556_11-12676_bp 00-004-0556_11-12676_bp -9      -9      -9      -9      unknown 41.3    7.3     41.3    7.3     0.53    0.39    3989    6688    4854    1853    0.077   40.65   312     108     108     96      0.00    0
00-004-0552_00-3065_nl  00-004-0553_00-2980_ebv -9      -9      -9      -9      unknown 54.4    10.6    54.4    10.6    0.52    0.39    3912    6964    4561    1947    0.044   50.31   315     95      123     97      0.00    0

The html then looks like this

Image

I have tried with and without the `--groups' flag, same result

So the samples are clearly related, but it seems to not be able to classify them properly. Is this a similar issue to the one here: https://github.com/brentp/somalier/issues/126 , to do with all samples being too related to be considered parents? Is it just an issue with Nanopore data being too noisy? Or something I am missing here, any help would be greatly appreciated. Many thanks!

MartinezRuiz-Carlos avatar Mar 04 '25 11:03 MartinezRuiz-Carlos

Hi, --infer has some rules about depth and allele balance that must be not met. From your plot, you can see a clear set of parent-child relationships (IBS0 is 0). I would send in a pedigree file with your known relationships and verify that the colors in the plot cluster together.

Another thing is that when you run --infer, it creates a new samples file (that can be used as a ped file). So first run somalier without any ped or groups file. Then re-run passing the created samples file as the --ped argument and the html colors would be updated if all constraints are met.

brentp avatar Mar 04 '25 16:03 brentp

Thank you, we were already running it as you described, I tried again removing groups, but no luck. Given we are using noisier ONT data I imagine we are hitting Somalier restrictions. Would there be an easy way to tweak the parameters to e.g. lower thresholds on base quality restrictions or depth?

MartinezRuiz-Carlos avatar Mar 07 '25 10:03 MartinezRuiz-Carlos