ExpansionHunter icon indicating copy to clipboard operation
ExpansionHunter copied to clipboard

bam which is separated by chromosome

Open Emi-sed opened this issue 3 years ago • 5 comments

Hi,

When I used ExpansionHunter with a bam file which is separated by chromosome, the number of repeats became shorter than with a bam file that contains all chromosomes. For example, when chr12.sorted.bam was used, the number of ATN1 repeats was 40. When all_chr.merged.sorted.bam was used, the number of ATN1 repeats was 55. When we use ExpansionHunter, do we need bam files that contain all chromosomes? Could you let me know your thoughts?

Sincerely, Emi

Emi-sed avatar Nov 05 '21 02:11 Emi-sed

If the repeat length is very close or above the read length (e.g. ~50 or more CAG repeats in case of 150 bp reads) then these reads may be misaligned to another chromosome and if you run EH only on the chr12 then those misaligned reads can't be used when calculating the repeat length. In that case, yes, you would need to run EH on the whole BAM file to get the best repeat length estimation.

Egor can correct me if I'm wrong!

andreasssh avatar Nov 05 '21 22:11 andreasssh

Thank you very much for your reply. Do you mean that ExpansionHunter uses other chromosomes' bam files when EH calculates the number of repeats? Could you show me the basis for that from an article or something? I'd like to read it very much.

Sincerely, Emi

Emi-sed avatar Nov 06 '21 07:11 Emi-sed

That's exactly right, Andreas!

Emi: EH extracts mates of reads aligned close to the repeat, even if those mates are located on other chromosomes. Here is a quick cartoon illustrating this.

example

When a BAM file was split by chromosome, EH no longer can recover such reads and hence can produce an incorrect size estimate (as Andreas pointed out).

Does this answer your question? Please let me know if you have any follow up questions.

Best wishes, Egor

egor-dolzhenko avatar Nov 06 '21 19:11 egor-dolzhenko

Thank you very much!! I was able to figure it out because of your answers. EH is a unique tool. We will use a bam file that contains all chromosomes from now on.

Sincerely, Emi

Emi-sed avatar Nov 09 '21 02:11 Emi-sed

Glad we could help! Please don't hesitate to reach out if you run into any other issues!

egor-dolzhenko avatar Nov 09 '21 03:11 egor-dolzhenko