nanopolish
nanopolish copied to clipboard
Segmentation fault in index when including -s
Hi @jts,
I am currently running nanopolish index
on two direct RNA libraries, one obtained from a MinION and the other from a PromethION. The former seems to have worked perfectly fine when including the -s
option to point at its sequencing_summary.txt
file, but the same call was causing the latter to crash, throwing a Segmentation Fault error. This seems to be solved if I remove the -s
option.
Due to being a larger run, the guppy basecalling of the PromethION library actually split the sequencing_summary.txt
file in two, with the first part being stored as sequencing_summary.txt.prev
. I was including only sequencing_summary.txt
as part of -s
. Could this be the problem? If so, how should I address it?
The documentation for index
doesn't seem to provide any information on how to use the -s
parameter.
I am using Nanopolish version 0.13.2, installed through within a conda environment.
Cheers, Fran
Can you check whether the sequencing summary file format is the same for the MinION and PromethION data? Is it possible the split sequencing summary file is truncated (does it have the right number of fields on every line?)
The *.prev file does seem to be truncated
$ tail -n 3 [...]/sequencing_summary.txt.prev
PAG50866_pass_a8c5e533_15.fast5 2a66c331-32e9-41c2-b386-eb15743b4918 a8c5e5330539ae4e22df7db35b72c833dbb28282 91 517 2 8131.663000 79.819000 23945 TRUE 8134.889000 22977 76.593000 3394 9.965094 0.000000 76.369461 13.398151 76.369461 13.398151
PAG50866_pass_a8c5e533_15.fast5 71680880-2add-49c9-b327-b00aefb024ec a8c5e5330539ae4e22df7db35b72c833dbb28282 91 2792 3 7748.188667 56.772000 17031 TRUE 7751.356333 16081 53.604333 2489 8.590801 0.000000 77.709274 13.934077 77.709274 13.934077
PAG50866_pass_a8c5e533_15.fast5
$ tail -n 3 [...]/sequencing_summary.txt
PAG50866_pass_a8c5e533_31.fast5 0c524be7-fb98-4e80-bca2-0008bf76c572 a8c5e5330539ae4e22df7db35b72c833dbb28282 119 2791 3 17481.855333 35.111333 10533 TRUE 17482.014667 10485 34.952000 1784 9.106236 0.000000 87.623909 14.470003 87.623909 14.470003
PAG50866_pass_a8c5e533_31.fast5 11739cd3-6df1-48d2-af3b-4ba0249ce8ef a8c5e5330539ae4e22df7db35b72c833dbb28282 119 2625 4 17733.792667 71.777333 21533 TRUE 17737.036000 20560 68.534000 3028 10.361650 0.000000 80.656868 13.398151 80.656868 13.398151
PAG50866_pass_a8c5e533_31.fast5 90898a45-9d38-4394-b4cc-e4e4873b546d a8c5e5330539ae4e22df7db35b72c833dbb28282 119 293 1 17766.737000 56.047000 16814 TRUE 17769.317667 16039 53.466333 1786 9.651551 0.000000 81.460762 15.005929 81.460762 15.005929
but I assume that is normal when it guppy times out and it gets resumed. Would you recommend deleting the last row and retrying the indexing?
If the file is truncated it means guppy terminated abnormally so didn't close the file properly. If you only passed sequencing_summary.txt
to nanopolish (and not sequencing_summary.txt.prev
) then the truncation shouldn't cause the seg fault (nanopolish would not try to read the .prev file). Can I see the head of sequencing_summary.txt
and the summary file from the MinION run?
For the PromethION run:
$ head -n 5 [...]/sequencing_summary.txt
filename read_id run_id batch_id channel mux start_time duration num_events passes_filtering template_start num_events_template template_duration sequence_length_template mean_qscore_template strand_score_template median_template mad_template scaling_median_template scaling_mad_template
PAG50866_fail_a8c5e533_4.fast5 d832e3ea-d137-49a6-9d0a-b4f130341039 a8c5e5330539ae4e22df7db35b72c833dbb28282 0 508 3 16116.210667 5.284333 1585 FALSE 16118.039667 1036 3.455333 131 5.806077 0.000000 88.963722 9.110743 88.963722 9.110743
PAG50866_fail_a8c5e533_4.fast5 571e8c02-5a4e-4370-b735-a2b53c1eab7a a8c5e5330539ae4e22df7db35b72c833dbb28282 0 1333 2 15261.546000 19.430667 5829 FALSE 15264.969000 4802 16.007667 603 5.504918 0.000000 81.996681 10.986484 81.996681 10.986484
PAG50866_fail_a8c5e533_4.fast5 f784e9eb-d5a9-4282-a81d-3d1cc3667533 a8c5e5330539ae4e22df7db35b72c833dbb28282 0 649 1 15757.121000 10.761000 3228 FALSE 15757.424000 3137 10.458000 458 5.128955 0.000000 77.173347 14.470003 77.173347 14.470003
PAG50866_fail_a8c5e533_4.fast5 c6b78690-436e-438a-8d72-7ce37c4274db a8c5e5330539ae4e22df7db35b72c833dbb28282 0 697 1 15169.492000 21.781667 6534 FALSE 15169.802667 6441 21.471000 1042 5.794460 0.000000 71.010201 11.254447 71.010201 11.254447
and for the MinION run:
$ head -n 5 [...]/sequencing_summary.txt
filename read_id run_id batch_id channel mux start_time duration num_events passes_filtering template_start num_events_template template_duration sequence_length_template mean_qscore_template strand_score_template median_template mad_template scaling_median_template scaling_mad_template
FAO33153_fail_1ddb64de_28.fast5 e15c1fb1-5586-472e-ba58-8d08c3e1093c 1ddb64de2c69221c6b30bbf05c892f79f2199f1d 0 468 1 11924.139774 64.340637 19379 TRUE 11925.856906 18862 62.623506 3530 7.233658 0.000000 85.713989 12.361288 85.713989 12.361288
FAO33153_fail_1ddb64de_28.fast5 9a5f2bba-0da6-49c4-bfc0-6001427ad246 1ddb64de2c69221c6b30bbf05c892f79f2199f1d 0 452 2 12186.254980 2.010956 605 FALSE 12186.254980 605 2.010956 112 6.817979 0.000000 118.179344 15.078054 118.179344 15.078054
FAO33153_fail_1ddb64de_28.fast5 dfbdca83-b41d-4e8b-97cf-2b008877a465 1ddb64de2c69221c6b30bbf05c892f79f2199f1d 0 327 3 12146.663347 8.406707 2532 FALSE 12150.514940 1372 4.555113 99 5.765373 0.000000 68.055000 9.508683 68.055000 9.508683
FAO33153_fail_1ddb64de_28.fast5 ed6738ea-1371-48c8-b7e6-30aaacc40e17 1ddb64de2c69221c6b30bbf05c892f79f2199f1d 0 204 3 12069.395086 3.739044 1126 FALSE 12069.395086 1126 3.739044 125 5.432938 0.000000 82.725540 11.410419 82.725540 11.410419
If nanopolish would not try to read the *.prev file, would that mean that many reads would be ignored? Is it possible to include multiple files within -s
?
Unfortunately you can't include multiple summary files, but you can merge them into one file, then provide that. I think you can simply cat
the files (after removing the truncated lines from .prev) - nanopolish will ignore the redundant headers.
If nanopolish would not try to read the *.prev file, would that mean that many reads would be ignored?
No, it will revert to indexing the fast5s not present in the summary the slow way (opening up each fast5 to see what reads are contained within). The summary file is only used as a hint to accelerate the indexing.
Thanks, Jared.
So if the file that was actually used appeared correct, do you have any suggestion of why would it have crashed? Any other test you want me to try?