TRUST4
TRUST4 copied to clipboard
strange TRUST4 VDJ gene calls
Thanks for your great work, but I have two questions about the VDJ gene calls in TRUST4 results. I ran TRUST4 on 10X Genomics 5' scRNA-seq. From the barcode_airr.tsv, I found one strange result.
sequence = ATATTACTGTGCAAAAGATGGGGATACGGGGGGAGCCAACTGGTTCGACCCCTGGG
v_call = IGHV3-9*01
v_cigar = 19M37S
d_call = IGHD1-1*01
d_cigar = 36S13M7S
j_call = IGHJ5*02
j_cigar = 36S20M
If my understanding of the result is right, this means that D gene ranges from 37 to 49 and J gene ranges from 37 to 56. I don't understand why d gene and j gene overlaps. Is this normal?
Then I query this sequence in NCBI IgBLAST, using all default settings.
Top V gene match: IGHV3-43*01,IGHV3-43*02,IGHV3-43D*03
Top D gene match: IGHD3-16*01,IGHD3-16*02
Top J gene match: IGHJ5*02
The top V and D gene matches are completely different from TRUST4 gene calls. Could you please explain why? Thanks a lot.
I think the alignment on IGHV3-901 and IGHV3-4301 is identical. So in the final report, TRUST4 just select one according to the global gene usage in the sample. Here is the annotation from TRUST4 on this sequence:
r1 56 0.11 IGHV3-4301(298):(0-18):(278-296):94.74,IGHV3-901(298):(0-18):(278-296):94.74 IGHD1-101(17):(36-48):(4-16):84.62 IGHJ502(51):(36-55):(1-20):100.00 * CDR1(0-0):0.00=null CDR2(0-0):0.00=null CDR3(7-54):66.67=TGTGCAAAAGATGGGGATACGGGGGGAGCCAACTGGTTCGACCCCTGG ATATTACTGTGCAAAAGATGGGGATACGGGGGGAGCCAACTGGTTCGACCCCTGGG
The D gene annotation is quite challenging due to the short length. I think in this case D gene is contained in J gene region, so TRUST4 seems made an error. This is a constraint I plan to add in the future.
Thanks a lot for your reply!