migec icon indicating copy to clipboard operation
migec copied to clipboard

database problem: missing gene

Open GildasLepennetier opened this issue 5 years ago • 1 comments

Hi, I suspect I have a problem I cannot really solve myself.

I noticed that one gene we are looking for is actually just not there (IGHV1-9). Using MIXCR on the same data, I actually can have it at 30%. I did some test that seem to show that there is a problem with the IGH_V.fa database creation.

The details are the following:

The following sequence match IGHV1-9*01 when I use the IMGT V-quest (http://www.imgt.org/IMGT_vquest/vquest)

  1. Mus Musculus
  2. IG

>MIG.1.R1.UMI:TTGGTTTCTTGG:10 TAAAGCAGAGGCCTGGACATGGCCTTGAGTGGATAGGAGAGATTTTACCTGGAAGAGGTAGAACTAACTACAATGAAAAGTTCAAGGGCAAGGCCACATTCACTGCAGAAACATCCTCCAACACAGCCTACATGCAGCTCAGCAGCCTGACATCTGAGGACTCTGCCGTCTATTACTGTGCAACTGGTAATACGATGGTAAACATGCCATACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGT

Musmus IGHV1-9*01 F identity = 95.53% ; Productive IGH

The following command:

java -jar $MIGEC_JAR CdrBlast --debug --all-segments --all-alleles -S MusMusculus -R IGH TESTSAMPLE.fastq output.txt

Using the TESTSAMPLE.fastq that contain only:

@MIG.1 R1 UMI:TTGGTTTCTTGG:10 TAAAGCAGAGGCCTGGACATGGCCTTGAGTGGATAGGAGAGATTTTACCTGGAAGAGGTAGAACTAACTACAATGAAAAGTTCAAGGGCAAGGCCACATTCACTGCAGAAACATCCTCCAACACAGCCTACATGCAGCTCAGCAGCCTGACATCTGAGGACTCTGCCGTCTATTACTGTGCAACTGGTAATACGATGGTAAACATGCCATACTGGGGCCAAGGCACCACTCTCACAGTCTCCTCAGAGAGT + HHIIHHHHHHIHHHHIHHHHHIHIHIHHHHHIHHHHIHHHHHHIIHHHIHHIHIIIIIHIHIIIIIIIIIIHIIIIIIHIHIIIIIHIHIIIIIIIIIHHIIIIHIIIIIIIIHIIIIIIIIHIHHIIHIIHIHIIIIGHIHIIHIIHHHHHHHGGHHHIHHIGHIIGHIGIHIHHHGHDHIHHHHHHHHFHGHHEGHGIHIHHHEIHFGGFFDHGHHGGGGHGCHGEF

give me the match

IGHV1-62-301,IGHV1S12601,IGHV1S4001 IGHJ201

(Removing the --all-segments and --all-alleles options give me only IGHV1S40 match)

When checking the temporary database for blast ( IGH_V.fa ), I actually cannot see the gene we are looking for (IGHV1-9)

Therefore, I think there is a problem during the creation of the IGH_V database. I have to ask you, @mikessh

From the migec/src/main/resources , I have a match for MusMusculus in all files (segments.txt, etc...) using the command.

grep IGHV1-9 *

Thank you in advance.

  • using migec-1.2.9.jar
  • Linux Mint (16.04.1-Ubuntu), /usr/local/bin/blastn BLAST 2.8.1

GildasLepennetier avatar Mar 04 '19 09:03 GildasLepennetier