megalodon icon indicating copy to clipboard operation
megalodon copied to clipboard

Client 1 has timed out.

Open jowodo opened this issue 2 years ago • 5 comments

I think my issue is related to #200. I'm on a HPC system (Oracle 7.9, kernel 3.10.0-1160.62.1.el7.x86_64).

$ tail megalodon_results/guppy_log/guppy_basecall_server_log-2022-04-19_11-28-56.log
model version id           None
adapter scaler model file: None
2022-04-19 11:29:12.337884 [guppy/info] CUDA device 0 (compute 7.5) initialised, memory limit 15634661376B (15529869312B free)
2022-04-19 11:29:12.649407 [guppy/info] lamp_arrangements arrangement folder not found: /scratch/jmf/kirkegaard/software/rerio/basecall_models/read_splitting/lamp_arrangements
2022-04-19 11:29:12.778519 [guppy/info] lamp_arrangements arrangement folder not found: /scratch/jmf/kirkegaard/software/rerio/basecall_models/barcoding/lamp_arrangements
2022-04-19 11:29:13.022543 [guppy/message] Starting server on port: ipc:///tmp/slurm-2607042/34c6-779d-fc59-37b0
2022-04-19 11:29:14.118220 [guppy/info] client connection request. ["res_dna_r941_prom_modbases_5mC_CpG_v001:>timeout_interval=15000>client_name=>barcode_kits=>detect_barcodes=0>move_and_trace_enabled=1>post_out=1"]
2022-04-19 11:29:14.129376 [guppy/info] New client connected Client 1 anonymous_client_1 id: 09f50eaf-f312-47cf-8c3f-885d24f76c00 (connection string = 'res_dna_r941_prom_modbases_5mC_CpG_v001:>timeout_interval=15000>client_name=>barcode_kits=>detect_barcodes=0>move_and_trace_enabled=1>post_out=1').
2022-04-19 11:29:35.021053 [guppy/info] Client 1 anonymous_client_1 id: 09f50eaf-f312-47cf-8c3f-885d24f76c00 has timed out.
2022-04-19 11:29:35.021150 [guppy/info] Client 1 anonymous_client_1 id: 09f50eaf-f312-47cf-8c3f-885d24f76c00 has disconnected.

There is a known bug in guppy when running on old kernel versions that looks like what you're reporting in the log there.

Originally posted by @fbrennen in https://github.com/nanoporetech/megalodon/issues/200#issuecomment-949443283

Can somebody help point me where this information comes from as I don't seem to find it or even better point me to a solution?

jowodo avatar Apr 19 '22 09:04 jowodo

Could you post the entire megalodon command submitted? This may be related to the post-out option required for older style modified baes calling (judging from the model noted in these guppy log lines). I would highly recommend switching to the new remora style modified base calling which is less compute heavy, maintains the highest accuracy canonical calls and improves methylation accuracy. See the --remora-modified-bases argument in megalodon -h.

marcus1487 avatar Apr 22 '22 03:04 marcus1487

Thanks for your help

FAST5="/scratch/jmf/mirror/ONT_0052/ONT_0052/20211124_1352_3B_PAH12587_3cc549b0/fast5_pass/barcode42/"
REF="/scratch/jmf/kirkegaard/projects/dome_051/results/JMF-2109-10-0023--ONT_0052.barcode42.flye.raconNP2x_medaka2x_raconILM.fa"
megalodon \
    $FAST5 \
    --guppy-params "-d /scratch/jmf/kirkegaard/software/rerio/basecall_models/ --verbose_logs" \
    --guppy-config res_dna_r941_prom_modbases_5mC_CpG_v001.cfg \
    --guppy-server-path /apps/guppy-gpu/6.1.1/bin/guppy_basecall_server \
    --outputs basecalls mappings mod_mappings mods \
    --reference $REF \
    --mod-motif m CG 0 \
    --devices all \
    --processes $THREADS

BTW: it was installed with pip install megalodon==2.5.0 ont-pyguppy-client-lib==6.1.1

jowodo avatar Apr 28 '22 11:04 jowodo

Hi @marcus1487

I tried running megalodon with the remora model and it ran without this issue. So it could indicate that it is a rerio model related issue? Any chance that you could release a remora model for R9.4.1 that works with data prior to LSK112? I can only seem to find e8 and e8.1 which I assume are both for LSK112* (https://github.com/nanoporetech/remora/tree/master/src/remora/trained_models).

*Is there an official "cheat sheet" with all the naming conventions used across the chemistry versions and software packages?

Best regards Rasmus

Kirk3gaard avatar May 02 '22 11:05 Kirk3gaard

I think this issue is related to Rerio/Remora models, but the main issue here is that the old-style flip-flop modified base models required the basecalling posterior matrix to be returned from the guppy server (and transferred off the GPU). This large memory block can result in these timeout issues. The newer Remora-style modified base models only require the much smaller "move table" (link between signal and basecalls) to be transferred. This is one of the massive simplifications made possible with the Remora-style models.

I'm not aware of any official documentation on the conversion from kits/flowcells to basecalling model names. There is currently no dedicated plan to train Remora models on legacy flowcell conditions. Training the Remora models for these older chemistries should not be too complicated though. If you'd like guidance for training such a model I would be happy to elaborate on the instructions provided on the Remora README.

marcus1487 avatar May 02 '22 18:05 marcus1487

Apologies for my miscommunication, but the e8 models are applicable to LSK109/LSK110/LSK111 kits. So the available Remora models should support your needs.

marcus1487 avatar May 04 '22 14:05 marcus1487