nanopolish icon indicating copy to clipboard operation
nanopolish copied to clipboard

Segmentation fault in index when including -s

Open franrodalg opened this issue 2 years ago • 6 comments

Hi @jts,

I am currently running nanopolish index on two direct RNA libraries, one obtained from a MinION and the other from a PromethION. The former seems to have worked perfectly fine when including the -s option to point at its sequencing_summary.txt file, but the same call was causing the latter to crash, throwing a Segmentation Fault error. This seems to be solved if I remove the -s option.

Due to being a larger run, the guppy basecalling of the PromethION library actually split the sequencing_summary.txt file in two, with the first part being stored as sequencing_summary.txt.prev. I was including only sequencing_summary.txt as part of -s. Could this be the problem? If so, how should I address it?

The documentation for index doesn't seem to provide any information on how to use the -s parameter.

I am using Nanopolish version 0.13.2, installed through within a conda environment.

Cheers, Fran

franrodalg avatar May 04 '22 17:05 franrodalg

Can you check whether the sequencing summary file format is the same for the MinION and PromethION data? Is it possible the split sequencing summary file is truncated (does it have the right number of fields on every line?)

jts avatar May 04 '22 18:05 jts

The *.prev file does seem to be truncated

$ tail -n 3 [...]/sequencing_summary.txt.prev
PAG50866_pass_a8c5e533_15.fast5	2a66c331-32e9-41c2-b386-eb15743b4918	a8c5e5330539ae4e22df7db35b72c833dbb28282	91	517	2	8131.663000	79.819000	23945	TRUE	8134.889000	22977	76.593000	3394	9.965094	0.000000	76.369461	13.398151	76.369461	13.398151
PAG50866_pass_a8c5e533_15.fast5	71680880-2add-49c9-b327-b00aefb024ec	a8c5e5330539ae4e22df7db35b72c833dbb28282	91	2792	3	7748.188667	56.772000	17031	TRUE	7751.356333	16081	53.604333	2489	8.590801	0.000000	77.709274	13.934077	77.709274	13.934077
PAG50866_pass_a8c5e533_15.fast5

$ tail -n 3 [...]/sequencing_summary.txt
PAG50866_pass_a8c5e533_31.fast5	0c524be7-fb98-4e80-bca2-0008bf76c572	a8c5e5330539ae4e22df7db35b72c833dbb28282	119	2791	3	17481.855333	35.111333	10533	TRUE	17482.014667	10485	34.952000	1784	9.106236	0.000000	87.623909	14.470003	87.623909	14.470003
PAG50866_pass_a8c5e533_31.fast5	11739cd3-6df1-48d2-af3b-4ba0249ce8ef	a8c5e5330539ae4e22df7db35b72c833dbb28282	119	2625	4	17733.792667	71.777333	21533	TRUE	17737.036000	20560	68.534000	3028	10.361650	0.000000	80.656868	13.398151	80.656868	13.398151
PAG50866_pass_a8c5e533_31.fast5	90898a45-9d38-4394-b4cc-e4e4873b546d	a8c5e5330539ae4e22df7db35b72c833dbb28282	119	293	1	17766.737000	56.047000	16814	TRUE	17769.317667	16039	53.466333	1786	9.651551	0.000000	81.460762	15.005929	81.460762	15.005929

but I assume that is normal when it guppy times out and it gets resumed. Would you recommend deleting the last row and retrying the indexing?

franrodalg avatar May 04 '22 18:05 franrodalg

If the file is truncated it means guppy terminated abnormally so didn't close the file properly. If you only passed sequencing_summary.txt to nanopolish (and not sequencing_summary.txt.prev) then the truncation shouldn't cause the seg fault (nanopolish would not try to read the .prev file). Can I see the head of sequencing_summary.txt and the summary file from the MinION run?

jts avatar May 04 '22 19:05 jts

For the PromethION run:

$ head -n 5 [...]/sequencing_summary.txt
filename	read_id	run_id	batch_id	channel	mux	start_time	duration	num_events	passes_filtering	template_start	num_events_template	template_duration	sequence_length_template	mean_qscore_template	strand_score_template	median_template	mad_template	scaling_median_template	scaling_mad_template
PAG50866_fail_a8c5e533_4.fast5	d832e3ea-d137-49a6-9d0a-b4f130341039	a8c5e5330539ae4e22df7db35b72c833dbb28282	0	508	3	16116.210667	5.284333	1585	FALSE	16118.039667	1036	3.455333	131	5.806077	0.000000	88.963722	9.110743	88.963722	9.110743
PAG50866_fail_a8c5e533_4.fast5	571e8c02-5a4e-4370-b735-a2b53c1eab7a	a8c5e5330539ae4e22df7db35b72c833dbb28282	0	1333	2	15261.546000	19.430667	5829	FALSE	15264.969000	4802	16.007667	603	5.504918	0.000000	81.996681	10.986484	81.996681	10.986484
PAG50866_fail_a8c5e533_4.fast5	f784e9eb-d5a9-4282-a81d-3d1cc3667533	a8c5e5330539ae4e22df7db35b72c833dbb28282	0	649	1	15757.121000	10.761000	3228	FALSE	15757.424000	3137	10.458000	458	5.128955	0.000000	77.173347	14.470003	77.173347	14.470003
PAG50866_fail_a8c5e533_4.fast5	c6b78690-436e-438a-8d72-7ce37c4274db	a8c5e5330539ae4e22df7db35b72c833dbb28282	0	697	1	15169.492000	21.781667	6534	FALSE	15169.802667	6441	21.471000	1042	5.794460	0.000000	71.010201	11.254447	71.010201	11.254447

and for the MinION run:

$ head -n 5 [...]/sequencing_summary.txt 
filename	read_id	run_id	batch_id	channel	mux	start_time	duration	num_events	passes_filtering	template_start	num_events_template	template_duration	sequence_length_template	mean_qscore_template	strand_score_template	median_template	mad_template	scaling_median_template	scaling_mad_template
FAO33153_fail_1ddb64de_28.fast5	e15c1fb1-5586-472e-ba58-8d08c3e1093c	1ddb64de2c69221c6b30bbf05c892f79f2199f1d	0	468	1	11924.139774	64.340637	19379	TRUE	11925.856906	18862	62.623506	3530	7.233658	0.000000	85.713989	12.361288	85.713989	12.361288
FAO33153_fail_1ddb64de_28.fast5	9a5f2bba-0da6-49c4-bfc0-6001427ad246	1ddb64de2c69221c6b30bbf05c892f79f2199f1d	0	452	2	12186.254980	2.010956	605	FALSE	12186.254980	605	2.010956	112	6.817979	0.000000	118.179344	15.078054	118.179344	15.078054
FAO33153_fail_1ddb64de_28.fast5	dfbdca83-b41d-4e8b-97cf-2b008877a465	1ddb64de2c69221c6b30bbf05c892f79f2199f1d	0	327	3	12146.663347	8.406707	2532	FALSE	12150.514940	1372	4.555113	99	5.765373	0.000000	68.055000	9.508683	68.055000	9.508683
FAO33153_fail_1ddb64de_28.fast5	ed6738ea-1371-48c8-b7e6-30aaacc40e17	1ddb64de2c69221c6b30bbf05c892f79f2199f1d	0	204	3	12069.395086	3.739044	1126	FALSE	12069.395086	1126	3.739044	125	5.432938	0.000000	82.725540	11.410419	82.725540	11.410419

If nanopolish would not try to read the *.prev file, would that mean that many reads would be ignored? Is it possible to include multiple files within -s?

franrodalg avatar May 04 '22 19:05 franrodalg

Unfortunately you can't include multiple summary files, but you can merge them into one file, then provide that. I think you can simply cat the files (after removing the truncated lines from .prev) - nanopolish will ignore the redundant headers.

If nanopolish would not try to read the *.prev file, would that mean that many reads would be ignored?

No, it will revert to indexing the fast5s not present in the summary the slow way (opening up each fast5 to see what reads are contained within). The summary file is only used as a hint to accelerate the indexing.

jts avatar May 04 '22 19:05 jts

Thanks, Jared.

So if the file that was actually used appeared correct, do you have any suggestion of why would it have crashed? Any other test you want me to try?

franrodalg avatar May 05 '22 09:05 franrodalg