chanjo
chanjo copied to clipboard
granularity
Hi,
I have not actually used chanjo for coverage reports but as I recall it provides reports of "completeness" on transcript or gene level.
Storing coverage data is tricky business since inclusion of too detailed information (per base) would quickly eat up a lot of space. However, some more granularity might still be useful for assessing the sequencing quality in different regions.
I have two questions.
-
Do you think it would be feasible to provide completeness info on exon-level? I made some quick tests with a WGS dataset using all exons for all ensembl transcripts and 4 completeness levels. The resulting data table amounted to 12 Mb compressed and 63 Mb uncompressed. Admittedly quite alot but it could reduced significantly more if, for example, CCDS was used instead. The size would also be reduced by using an SQL database if the data is sufficiently normalized.
-
In my mind, however, it would be a good thing if coverage data could be included directly into the scout system. But then it would also be convenient if data was stored in MongoDB, which though prevents the use JOINs and normalized data.
Do you have any thoughts on this?