strspy icon indicating copy to clipboard operation
strspy copied to clipboard

General questions and Error: inconsistent naming convention

Open HLHsieh opened this issue 1 year ago • 1 comments

Hi Rupesh,

I have several questions about the usage:

  1. I am wondering whether there are any limitations on its detection length.
  2. Could I use strspy to detect the repeat sequence CCCCGCGCCCGGCCTTCCCCGGGGTCCCTGCGGCCCCGACTGTGCGCC profile?
  3. How do I use strspy to quantify the number of contiguous repeat units? I did not see a direct result from the output, or I might have missed this information.

Besides, when running strspy, I got this error:

***** WARNING: File /scratch/kinfai_root/kinfai0/hsinlun/tri_test/align/C9ORF72_1_9R_NanoSim_2x.sorted.bam has inconsistent naming convention for record:
chr1	14337	20040	C9ORF72-1_14451_aligned_12683_F_19_5702_13	0	+

***** WARNING: File /scratch/kinfai_root/kinfai0/hsinlun/tri_test/align/C9ORF72_1_9R_NanoSim_2x.sorted.bam has inconsistent naming convention for record:
chr1	14337	20040	C9ORF72-1_14451_aligned_12683_F_19_5702_13	0	+

I would appreciate any solutions to this.

Best, Hsin

HLHsieh avatar Jul 06 '24 02:07 HLHsieh

Hi

  1. There is no limitation of length of the STRs. We have seen STR in chrY more than 100.
  2. You can, but STRspy rely on the database fasta (flanking_seq -----STR repeats------flanking)and the bed. If you are able to prepare from your reference location (repeat ordinates).
  3. if you provide the db fasta, STRspy will output a freq files of all STRs that present in your sample. Please have a look the test db fasta and the results.
  4. It would be useful to track the warning or error if you show me how did you run the STRspy step by step. But i guess thats the error comes from SNV calling tool i.e. xAtlas. It happens as your bam is the matching the reference. xAtlas, require the str bam not the genomic bam.

Hope this helps !

Rupesh Kesharwani

unique379r avatar Jul 16 '24 17:07 unique379r