architect A couple questions

Hi Volodymyr,

I had a couple questions. First, our SLR reads came split by their well/barcode. Should I map the reads for each of these back to our contigs one-by-one and then combine those in to a single file prior to running bam2containment.py?

Second, is a script included that generates the TSV file for paired-end data?

Thanks for any input you can provide.

Sincerely,

Ian

Oct 28 '16 22:10 geneticsguy

Hi Ian,

You can map the bam files either separately or jointly, but make sure that each read has an identifier that indicates which well/barcode it came from. In my experiments, I prepended each read with well%d_ and used that as the tag. It's not ideal, but I'm happy to add to the script a parsing module that works best with the ID's that people typically have in their BAMs.

You can use pe-connections.py to generate the TSV file, but for best results you can try to use the graph generated by a third-party scaffolder like SSPACE.

-- Volodymyr

Nov 04 '16 01:11 kuleshov

Hi Volodymyr,

I have been playing around with the bam2containment.py to make it compatible with my data. I have a bam file which was generated by bwa. Reads were mapped to a de novo assembly generated in SOAPdenovo2. I have the well ID just as a tag at the front followed by an underscore. No problems in obtaining the well ID after some slight modifications to the script.

The problem I am having though is

python bam2containment.py -b readclouds.bam -c readclouds.containment reads processed: 1000000 Traceback (most recent call last): File "bam2containment.py", line 114, in print 'reads processed: ', n, samfile.mapped File "pysam/calignmentfile.pyx", line 1457, in pysam.calignmentfile.AlignmentFile.mapped.get (pysam/calignmentfile.c:15739) File "pysam/calignmentfile.pyx", line 374, in pysam.calignmentfile.AlignmentFile.check_index (pysam/calignmentfile.c:5249) ValueError: mapping information not recorded in index or index not available

Does the bam file need to be sorted and indexed, or might the error be arising for some other reason?

Thanks,

Ian

On Thu, Nov 3, 2016 at 6:53 PM, Volodymyr Kuleshov <[email protected]

wrote:

Hi Ian,

You can map the bam files either separately or jointly, but make sure that each read has an identifier that indicates which well/barcode it came from. In my experiments, I prepended each read with well%d_ and used that as the tag. It's not ideal, but I'm happy to add to the script a parsing module that works best with the ID's that people typically have in their BAMs.

You can use pe-connections.py to generate the TSV file, but for best results you can try to use the graph generated by a third-party scaffolder like SSPACE.

-- Volodymyr

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_kuleshov_architect_issues_1-23issuecomment-2D258325594&d=DgMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=cg_45DCy4gJjPFQY1emeDbulhBe6chU-uKzvG2VJC_U&m=O7sJgk5Itdg5gtHA-dpztC5uIIJvIjyxU05KVmxE2e4&s=laaPloVj-Tj7I28H2-FPk2DQkhsTX9t0tRGw8YghtPk&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AVx10fWeBKIkFZsgcBO0m5QWUO9Hlj2Kks5q6pAcgaJpZM4Kj642&d=DgMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=cg_45DCy4gJjPFQY1emeDbulhBe6chU-uKzvG2VJC_U&m=O7sJgk5Itdg5gtHA-dpztC5uIIJvIjyxU05KVmxE2e4&s=7cEaEar5xI1YnKedWIGipNUhdm0YoTZkNTJPeaURQs8&e= .

Ian Ehrenreich Assistant Professor Director of Graduate Studies, Molecular Biology PhD Program Molecular and Computational Biology Section University of Southern California Email: [email protected] Phone: (213) 821-5349

Nov 07 '16 22:11 geneticsguy

Hi Ian,

Yes, the file needs to be indexed, but the script should do it for you. Can you check if you have a .bai file in the same folder? Also, in your case the index might be stale. Try deleting it and rerunning the script. If that still doesn't work, please try to index it using bwa.

-- Volodymyr

On Mon, Nov 7, 2016 at 2:14 PM, geneticsguy [email protected] wrote:

Hi Volodymyr,

I have been playing around with the bam2containment.py to make it compatible with my data. I have a bam file which was generated by bwa. Reads were mapped to a de novo assembly generated in SOAPdenovo2. I have the well ID just as a tag at the front followed by an underscore. No problems in obtaining the well ID after some slight modifications to the script.

The problem I am having though is

python bam2containment.py -b readclouds.bam -c readclouds.containment reads processed: 1000000 Traceback (most recent call last): File "bam2containment.py", line 114, in print 'reads processed: ', n, samfile.mapped File "pysam/calignmentfile.pyx", line 1457, in pysam.calignmentfile.AlignmentFile.mapped.get (pysam/calignmentfile.c:15739) File "pysam/calignmentfile.pyx", line 374, in pysam.calignmentfile.AlignmentFile.check_index (pysam/calignmentfile.c:5249) ValueError: mapping information not recorded in index or index not available

Does the bam file need to be sorted and indexed, or might the error be arising for some other reason?

Thanks,

Ian

On Thu, Nov 3, 2016 at 6:53 PM, Volodymyr Kuleshov < [email protected]

wrote:

Hi Ian,

You can map the bam files either separately or jointly, but make sure that each read has an identifier that indicates which well/barcode it came from. In my experiments, I prepended each read with well%d_ and used that as the tag. It's not ideal, but I'm happy to add to the script a parsing module that works best with the ID's that people typically have in their BAMs.

You can use pe-connections.py to generate the TSV file, but for best results you can try to use the graph generated by a third-party scaffolder like SSPACE.

-- Volodymyr

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https- 3A__github.com_kuleshov_architect_issues_1-23issuecomment-2D258325594&d= DgMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=cg_ 45DCy4gJjPFQY1emeDbulhBe6chU-uKzvG2VJC_U&m=O7sJgk5Itdg5gtHA- dpztC5uIIJvIjyxU05KVmxE2e4&s=laaPloVj-Tj7I28H2- FPk2DQkhsTX9t0tRGw8YghtPk&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https- 3A__github.com_notifications_unsubscribe-2Dauth_ AVx10fWeBKIkFZsgcBO0m5QWUO9Hlj2Kks5q6pAcgaJpZM4Kj642&d=DgMFaQ&c= clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=cg_ 45DCy4gJjPFQY1emeDbulhBe6chU-uKzvG2VJC_U&m=O7sJgk5Itdg5gtHA- dpztC5uIIJvIjyxU05KVmxE2e4&s=7cEaEar5xI1YnKedWIGipNUhdm0YoT ZkNTJPeaURQs8&e=> .

Ian Ehrenreich Assistant Professor Director of Graduate Studies, Molecular Biology PhD Program Molecular and Computational Biology Section University of Southern California Email: [email protected] Phone: (213) 821-5349

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kuleshov/architect/issues/1#issuecomment-258980384, or mute the thread https://github.com/notifications/unsubscribe-auth/ACP_zm6Gn6BG3fOjycv7YHanS9TdKjxsks5q76LCgaJpZM4Kj642 .

Nov 08 '16 00:11 kuleshov

Hi Volodymyr,

I don't have a bai file. Should I generate one through bwa? Also, could you tell me though, what is the nature of the edge file that should be supplied to pe-connections.py?

I have Architect running on this point using the containment file and accompanying fasta, but no PE file.

Thanks!

Ian

On Mon, Nov 7, 2016 at 4:29 PM, Volodymyr Kuleshov <[email protected]

wrote:

Hi Ian,

Yes, the file needs to be indexed, but the script should do it for you. Can you check if you have a .bai file in the same folder? Also, in your case the index might be stale. Try deleting it and rerunning the script. If that still doesn't work, please try to index it using bwa.

-- Volodymyr

On Mon, Nov 7, 2016 at 2:14 PM, geneticsguy [email protected] wrote:

Hi Volodymyr,

I have been playing around with the bam2containment.py to make it compatible with my data. I have a bam file which was generated by bwa. Reads were mapped to a de novo assembly generated in SOAPdenovo2. I have the well ID just as a tag at the front followed by an underscore. No problems in obtaining the well ID after some slight modifications to the script.

The problem I am having though is

python bam2containment.py -b readclouds.bam -c readclouds.containment reads processed: 1000000 Traceback (most recent call last): File "bam2containment.py", line 114, in print 'reads processed: ', n, samfile.mapped File "pysam/calignmentfile.pyx", line 1457, in pysam.calignmentfile.AlignmentFile.mapped.get (pysam/calignmentfile.c:15739) File "pysam/calignmentfile.pyx", line 374, in pysam.calignmentfile.AlignmentFile.check_index (pysam/calignmentfile.c:5249) ValueError: mapping information not recorded in index or index not available

Does the bam file need to be sorted and indexed, or might the error be arising for some other reason?

Thanks,

Ian

On Thu, Nov 3, 2016 at 6:53 PM, Volodymyr Kuleshov < [email protected]

wrote:

Hi Ian,

You can map the bam files either separately or jointly, but make sure that each read has an identifier that indicates which well/barcode it came from. In my experiments, I prepended each read with well%d_ and used that as the tag. It's not ideal, but I'm happy to add to the script a parsing module that works best with the ID's that people typically have in their BAMs.

You can use pe-connections.py to generate the TSV file, but for best results you can try to use the graph generated by a third-party scaffolder like SSPACE.

-- Volodymyr

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https- 3A__github.com_kuleshov_architect_issues_1-23issuecomment-2D258325594&d= DgMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=cg_ 45DCy4gJjPFQY1emeDbulhBe6chU-uKzvG2VJC_U&m=O7sJgk5Itdg5gtHA- dpztC5uIIJvIjyxU05KVmxE2e4&s=laaPloVj-Tj7I28H2- FPk2DQkhsTX9t0tRGw8YghtPk&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https- 3A__github.com_notifications_unsubscribe-2Dauth_ AVx10fWeBKIkFZsgcBO0m5QWUO9Hlj2Kks5q6pAcgaJpZM4Kj642&d=DgMFaQ&c= clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=cg_ 45DCy4gJjPFQY1emeDbulhBe6chU-uKzvG2VJC_U&m=O7sJgk5Itdg5gtHA- dpztC5uIIJvIjyxU05KVmxE2e4&s=7cEaEar5xI1YnKedWIGipNUhdm0YoT ZkNTJPeaURQs8&e=> .

Ian Ehrenreich Assistant Professor Director of Graduate Studies, Molecular Biology PhD Program Molecular and Computational Biology Section University of Southern California Email: [email protected] Phone: (213) 821-5349

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kuleshov/architect/issues/1#issuecomment-258980384, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACP_ zm6Gn6BG3fOjycv7YHanS9TdKjxsks5q76LCgaJpZM4Kj642> .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_kuleshov_architect_issues_1-23issuecomment-2D259008384&d=DgMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=cg_45DCy4gJjPFQY1emeDbulhBe6chU-uKzvG2VJC_U&m=Hns7q3l8tW2hKiT-ml9WAB11AFSOySjTSg44XiL-kwE&s=cmaLWIRaK5nIbwsYkCYqaGWL_njNjynLuz3IhoYwhYw&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AVx10QDoFiMYP8Gc0-5FZmE7neI4youKVkks5q78J3gaJpZM4Kj642&d=DgMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=cg_45DCy4gJjPFQY1emeDbulhBe6chU-uKzvG2VJC_U&m=Hns7q3l8tW2hKiT-ml9WAB11AFSOySjTSg44XiL-kwE&s=p1sN8s3-Gisw4srN9VXNUjkE2qmdr74GbtNcvY5-DyE&e= .

Ian Ehrenreich Assistant Professor Director of Graduate Studies, Molecular Biology PhD Program Molecular and Computational Biology Section University of Southern California Email: [email protected] Phone: (213) 821-5349

Nov 08 '16 04:11 geneticsguy

The PE file is if you have paired-end reads (or additional mate pairs), and you want to use them to augment the scaffolding procedure.

On Mon, Nov 7, 2016 at 8:52 PM, geneticsguy [email protected] wrote:

Hi Volodymyr,

I don't have a bai file. Should I generate one through bwa? Also, could you tell me though, what is the nature of the edge file that should be supplied to pe-connections.py?

I have Architect running on this point using the containment file and accompanying fasta, but no PE file.

Thanks!

Ian

On Mon, Nov 7, 2016 at 4:29 PM, Volodymyr Kuleshov < [email protected]

wrote:

Hi Ian,

Yes, the file needs to be indexed, but the script should do it for you. Can you check if you have a .bai file in the same folder? Also, in your case the index might be stale. Try deleting it and rerunning the script. If that still doesn't work, please try to index it using bwa.

-- Volodymyr

On Mon, Nov 7, 2016 at 2:14 PM, geneticsguy [email protected] wrote:

Hi Volodymyr,

I have been playing around with the bam2containment.py to make it compatible with my data. I have a bam file which was generated by bwa. Reads were mapped to a de novo assembly generated in SOAPdenovo2. I have the well ID just as a tag at the front followed by an underscore. No problems in obtaining the well ID after some slight modifications to the script.

The problem I am having though is

python bam2containment.py -b readclouds.bam -c readclouds.containment reads processed: 1000000 Traceback (most recent call last): File "bam2containment.py", line 114, in print 'reads processed: ', n, samfile.mapped File "pysam/calignmentfile.pyx", line 1457, in pysam.calignmentfile.AlignmentFile.mapped.get (pysam/calignmentfile.c:15739) File "pysam/calignmentfile.pyx", line 374, in pysam.calignmentfile.AlignmentFile.check_index (pysam/calignmentfile.c:5249) ValueError: mapping information not recorded in index or index not available

Does the bam file need to be sorted and indexed, or might the error be arising for some other reason?

Thanks,

Ian

On Thu, Nov 3, 2016 at 6:53 PM, Volodymyr Kuleshov < [email protected]

wrote:

Hi Ian,

You can map the bam files either separately or jointly, but make sure that each read has an identifier that indicates which well/barcode it came from. In my experiments, I prepended each read with well%d_ and used that as the tag. It's not ideal, but I'm happy to add to the script a parsing module that works best with the ID's that people typically have in their BAMs.

You can use pe-connections.py to generate the TSV file, but for best results you can try to use the graph generated by a third-party scaffolder like SSPACE.

-- Volodymyr

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https- 3A__github.com_kuleshov_architect_issues_1- 23issuecomment-2D258325594&d= DgMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=cg_ 45DCy4gJjPFQY1emeDbulhBe6chU-uKzvG2VJC_U&m=O7sJgk5Itdg5gtHA- dpztC5uIIJvIjyxU05KVmxE2e4&s=laaPloVj-Tj7I28H2- FPk2DQkhsTX9t0tRGw8YghtPk&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https- 3A__github.com_notifications_unsubscribe-2Dauth_ AVx10fWeBKIkFZsgcBO0m5QWUO9Hlj2Kks5q6pAcgaJpZM4Kj642&d=DgMFaQ&c= clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=cg_ 45DCy4gJjPFQY1emeDbulhBe6chU-uKzvG2VJC_U&m=O7sJgk5Itdg5gtHA- dpztC5uIIJvIjyxU05KVmxE2e4&s=7cEaEar5xI1YnKedWIGipNUhdm0YoT ZkNTJPeaURQs8&e=> .

Ian Ehrenreich Assistant Professor Director of Graduate Studies, Molecular Biology PhD Program Molecular and Computational Biology Section University of Southern California Email: [email protected] Phone: (213) 821-5349

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/kuleshov/architect/issues/1#issuecomment-258980384 , or mute the thread <https://github.com/notifications/unsubscribe-auth/ACP_ zm6Gn6BG3fOjycv7YHanS9TdKjxsks5q76LCgaJpZM4Kj642> .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https- 3A__github.com_kuleshov_architect_issues_1-23issuecomment-2D259008384&d= DgMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=cg_ 45DCy4gJjPFQY1emeDbulhBe6chU-uKzvG2VJC_U&m=Hns7q3l8tW2hKiT- ml9WAB11AFSOySjTSg44XiL-kwE&s=cmaLWIRaK5nIbwsYkCYqaGWL_ njNjynLuz3IhoYwhYw&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https- 3A__github.com_notifications_unsubscribe-2Dauth_AVx10QDoFiMYP8Gc0- 5FZmE7neI4youKVkks5q78J3gaJpZM4Kj642&d=DgMFaQ&c= clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=cg_ 45DCy4gJjPFQY1emeDbulhBe6chU-uKzvG2VJC_U&m=Hns7q3l8tW2hKiT- ml9WAB11AFSOySjTSg44XiL-kwE&s=p1sN8s3-Gisw4srN9VXNUjkE2qmdr74GbtNcvY 5-DyE&e=> .

Ian Ehrenreich Assistant Professor Director of Graduate Studies, Molecular Biology PhD Program Molecular and Computational Biology Section University of Southern California Email: [email protected] Phone: (213) 821-5349

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kuleshov/architect/issues/1#issuecomment-259045458, or mute the thread https://github.com/notifications/unsubscribe-auth/ACP_zuiwGC3isDM1qGz3kGO3988OQs_zks5q8AApgaJpZM4Kj642 .

Nov 10 '16 00:11 kuleshov

architect architect copied to clipboard

A couple questions

architect
architect copied to clipboard