migec icon indicating copy to clipboard operation
migec copied to clipboard

High number of master undef sequences

Open ashwinkallor opened this issue 2 years ago • 0 comments

Hello,

I am running migec-1.2.9 on canine paired TCR seq data for UMI extraction. I tested one sample (R1 & R2) by running Checkout manual, since I am planning on running Checkout Batch once I am satisfied with the results. After running Checkout, I notice that the UMI extraction is happening (the header of the output fastq file has "UMI:<UMI sequence><Quality string>" in it) but I seem to be getting a high number of undefined sequences only for the master barcode but 0 undefs for the slave barcode sequence, despite specifying both of them in the barcodes file.

My question to you is what are the general causes for a high number of undefined master sequences (in my case, the size of undef-m_R1 & R2 exceed the size of sample_R1 & R2)? Are the high number of undefined sequences occurring because the number of barcode sequences in the sample file is low? Or are they occurring because I am not specifying the barcodes accurately?

For instance, suppose my barcode sequences are "TCGCCTTA+CGTCTAAT" I am specifying them in the barcodes file as follows:

Sample_1 NNNNTCGCCTTA NNNNCGTCTAAT

As additional information, the barcodes are already there in the fastq header file BEFORE running MiGEC on them:

For example:

  1. Before MiGEC: @M03495:180:000000000-JL7JG:1:1102:14170:1053 1:N:0:NCGCCTTA+NTCTCTAT

  2. After MiGEC: @M03495:180:000000000-JL7JG:1:1102:11636:1318 1:N:0:TCGCCTTA+CTCTCTAT R1 UMI:CCAGTCAC:3))10+*0

Your help would be greatly appreciated.

Ashwin

ashwinkallor avatar Jan 23 '22 23:01 ashwinkallor