breakseq2
breakseq2 copied to clipboard
Creation of breakpoints file (in either GFF3 or FASTA format)
I would like to use BreakSeq2
but am unsure of how to create a breakpoint library for other model organisms; specifically C. elegans.
It's unclear from the documentation how a breakpoint library (either as FASTA or GFF) was created for humans.
I have used {MindTheGap} which creates a file.breakpoints
file of detected insertion sites.
The file looks like this:
[moldach@cdr767 BreakSeq2]$ head $BREAKPOINTS
>bkpt3_I_pos_221157_fuzzy_3_HOM left_kmer
TGAAATTGCCATTTCGACTGTGGCAGAGCCC
>bkpt3_I_pos_221157_fuzzy_3_HOM REPEATED right_kmer
ACGAAGAGCGTCGTGGATTCGGTGAGCTTCT
>bkpt4_I_pos_232103_fuzzy_4_HET left_kmer
CGGGCCATTTGGGTCGCGGCCGGTCTGGGGG
>bkpt4_I_pos_232103_fuzzy_4_HET right_kmer
GCTGGGCCCGTACTTCCTGGGAAGTTGAGAA
>bkpt6_I_pos_256855_fuzzy_0_HOM left_kmer
AATTTTCATCTGAAAATTTAGTACTGAAATC
Looking at the .gff
Breakpoints Library for humans looks much different:
[moldach@cdr767 BreakSeq2]$ head breakseq2_bplib_20150129.gff
1 1KG_Phase1 Deletion 766594 769112 . . .
1 1KG_Phase1 Deletion 776770 791881 . . .
1 1KG_Phase1 Deletion 869385 870317 . . .
1 1KG_Phase1 Deletion 912049 913594 . . .
1 1KG_Phase1 Deletion 947122 948001 . . .
1 1KG_Phase1 Deletion 1086818 1087023 . . .
1 1KG_Phase1 Deletion 1142720 1143140 . . .
1 1KG_Phase1 Deletion 1443564 1445764 . . .
1 1KG_Phase1 Deletion 1465912 1466230 . . .
1 1KG_Phase1 Deletion 1598414 1598580 . . .
Can the breakpoint information from MindTheGap
be converted to a format that will work with breakseq2
? If not, which tool(s) can be used to generate this information?