somalier icon indicating copy to clipboard operation
somalier copied to clipboard

Feature request: output per site info (e.g. genotype, depth)

Open fgvieira opened this issue 6 years ago • 4 comments

Dear all,

would it be possible to get more detailed per-site info for QC? Right now somalier outputs only per sample and pairs of samples info.

There is already something similar on depthview, but it is very broad and only on HTML. Would it be possible to get that info on a TSV also? Maybe reporting for each site (rows) and each individual (columns) the coverage for each allele as well as somalier's called genotype.

thanks,

fgvieira avatar Nov 04 '19 15:11 fgvieira

is this still needed? i am very hesitant to add this, but it could be a debug option.

brentp avatar Jun 30 '20 18:06 brentp

I agree that it would be nice to have as a debug option.

fgvieira avatar Jul 01 '20 06:07 fgvieira

I would also appreciate more detailed per-site info in the output from somalier.

Would it perhaps be possible for the extract-function, in addition to the .somalier-files, to output a TSV-file (or something kind of text-format-file) with genomic positions, readcounts, REF, ALT and genotype-calls, that is, some like the following:

chr	position	nref	nalt	nother	REF	ALT	GT
chr2	20616424	184	171	1	C	T	HET
chr4	165697039	0	328	0	G	T	HOM_ALT
chr4	190318079	290	0	0	C	G	HOM_REF
chr6	165045333	0	283	0	G	T	HOM_ALT
...

asp8200 avatar May 21 '21 10:05 asp8200

Hi, you can write this using a simple python script that accepts the sites file and a somalier file (or many somalier files). Here is a function that will read the sites data into a python structure for you: https://github.com/brentp/somalier/blob/master/scripts/ancestry-predict.py#L7

The sites is an array with n_sites rows and 2 columns where first column is ref depth and 2nd is alt depth.

brentp avatar May 21 '21 10:05 brentp