chromosomer icon indicating copy to clipboard operation
chromosomer copied to clipboard

Step by Step - Chromosomer Approach

Open elyas101 opened this issue 7 years ago • 6 comments

Hi,

I have been trying to use Chromosomer to assemble scaffolds into chromosome using a reference and would be very interested if you could share with me some details about the input files.

I am still trying to figure out how these two files looks like: alignment_file gap_size and if you have an example data sets that you could share.

How do we create the alignment_file (using MUMmer?) or do we generate the gap_size? Any help would be really appreciated. Best,

Elias

elyas101 avatar Jun 08 '17 00:06 elyas101

Hi Elias,

Sorry for my late reply. The alignment_file should be in the BLAST tabular format; it describes alignments of fragments to be assembled to reference chromosomes. You can obtain it using the blastn tool from NCBI BLAST+ package with the -outfmt 6 option. You can also use MUMmer if you convert its output to the BLAST tabular format and assign proper weights to the alignments so that Chromosomer could compare them to each other.

The gap_size parameter is a single numerical value that specifies the size of a gap inserted between fragments arranged on reference chromosomes. It is a technical parameter that does not have any evolutional meaning; here is a short discussion on its value:

https://twitter.com/gtamazian/status/775690247170584576

I also added a brief guide to Chromosomer assembly process to the repository wiki. The archive containing scripts and datasets related to the guide is attached: chromosomer_demo.tar.gz.

Best, Gaik

gtamazian avatar Jun 16 '17 23:06 gtamazian

Hi Gaik,

Thank you for getting back to me, I truly appreciate. This sounds very helpful. I will give it a second try. Thank you again Gaik!

Best,. Elias

On Jun 16, 2017 7:32 PM, Gaik Tamazian [email protected] wrote:

Hi Elias,

Sorry for my late reply. The alignment_file should be in the BLAST tabular format; it describes alignments of fragments to be assembled to reference chromosomes. You can obtain it using the blastn tool from NCBI BLAST+ package with the -outfmt 6 option. You can also use MUMmer if you convert its output to the BLAST tabular format and assign proper weights to the alignments so that Chromosomer could compare them to each other.

The gap_size parameter is a single numerical value that specifies the size of a gap inserted between fragments arranged on reference chromosomes. It is a technical parameter that does not have any evolutional meaning; here is a short discussion on its value:

https://twitter.com/gtamazian/status/775690247170584576

I also added a brief guide to Chromosomer assembly process to the repository wikihttps://github.com/gtamazian/chromosomer/wiki/Brief-guide-to-Chromosomer-assembly-process. The archive containing scripts and datasets related to the guide is attached: chromosomer_demo.tar.gzhttps://github.com/gtamazian/chromosomer/files/1082164/chromosomer_demo.tar.gz.

Best, Gaik

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/gtamazian/chromosomer/issues/14#issuecomment-309161858, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AP8Akp6Z3Lh-auR5tu-ENpDopogOvyHDks5sExCggaJpZM4NzcV4.

elyas101 avatar Jun 16 '17 23:06 elyas101

Hi Gaik

I'm using Chromosomer to build assembly of my query genome. On executing the command 'chromosomer fastalength demo_fragments.fa demo_fragments.length' It produced a fasta.fai file which is in the format below mentioned. Apart from the query fragment and the length, what are these other three columns are indicating?

scaffold16|size2055373 2055586 11100621 100 101 scaffold19|size3466045 3465886 13176787 100 101 scaffold11|size2509887 2509923 16677356 100 101 scaffold15|size3832740 3830757 19212403 100 101 scaffold3|size4243428 4243561 23081491 100 101 scaffold10|size4016056 4017415 27367512 100 101 scaffold17|size2482255 2482430 31425126 100 101 scaffold12|size2336645 2336420 33932405 100 101 scaffold20|size1703643 1703728 36292214 100 101 scaffold24|size1527272 1527297 38013004 100 101

Thanks in advance

Rajanikanth111 avatar Aug 04 '17 09:08 Rajanikanth111

Hello,

The .fai file is a FASTA file index created when accessing a FASTA file with routines of the pyfasta module. You may safely delete it after running Chromosomer.

gtamazian avatar Aug 04 '17 15:08 gtamazian

Hey Gaik, Thanks for that. I have one more clarification, On executing the blastn command for my query sequence it gives out 1000 and odd matches found for a single scaffold and my query has 22,500 scaffolds and the BLAST+ results shows more than 80,000 identifiers ( Which was reduced to 50,000 on applying max_target_seqs 1 command) Could you please explain why some scaffolds are identified several times? Thanks in advance

Rajanikanth C

Rajanikanth111 avatar Aug 11 '17 05:08 Rajanikanth111

Hello Rajanikanth,

A scaffold might show multiple alignments with other ones if it contains a repetitive sequence that has not been masked for the alignment. The max_target_seqs option restricts the number of alignments between a pair of query-subject sequences, but it does not specify how many subject sequences a query can be aligned to.

gtamazian avatar Aug 11 '17 11:08 gtamazian