migec
migec copied to clipboard
High number of master undef sequences
Hello,
I am running migec-1.2.9 on canine paired TCR seq data for UMI extraction. I tested one sample (R1 & R2) by running Checkout manual, since I am planning on running Checkout Batch once I am satisfied with the results. After running Checkout, I notice that the UMI extraction is happening (the header of the output fastq file has "UMI:<UMI sequence><Quality string>" in it) but I seem to be getting a high number of undefined sequences only for the master barcode but 0 undefs for the slave barcode sequence, despite specifying both of them in the barcodes file.
My question to you is what are the general causes for a high number of undefined master sequences (in my case, the size of undef-m_R1 & R2 exceed the size of sample_R1 & R2)? Are the high number of undefined sequences occurring because the number of barcode sequences in the sample file is low? Or are they occurring because I am not specifying the barcodes accurately?
For instance, suppose my barcode sequences are "TCGCCTTA+CGTCTAAT" I am specifying them in the barcodes file as follows:
Sample_1
As additional information, the barcodes are already there in the fastq header file BEFORE running MiGEC on them:
For example:
-
Before MiGEC: @M03495:180:000000000-JL7JG:1:1102:14170:1053 1:N:0:NCGCCTTA+NTCTCTAT
-
After MiGEC: @M03495:180:000000000-JL7JG:1:1102:11636:1318 1:N:0:TCGCCTTA+CTCTCTAT R1 UMI:CCAGTCAC:3))10+*0
Your help would be greatly appreciated.
Ashwin