CHEUI icon indicating copy to clipboard operation
CHEUI copied to clipboard

CHEUI_preprocess_m5C.py give error

Open aman21392 opened this issue 9 months ago • 15 comments

I successfully run the CHEUI_preprocess_m6A.py. but when I ran the CHEUI_preprocess_m5C.py script then the below error occurred. Can you please resolve this issue? Thanks in advance Here is the command which I used: nohup python3 /home/apps/CHEUI/scripts/CHEUI_preprocess_m5C.py -i /Drive7/CHEUI/nanopolish_out.txt -m /home/apps/CHEUI/kmer_models/model_kmer.csv -o out_C_signals -n 35 &

multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/home/apps/CHEUI/scripts/CHEUI_preprocess_m5C.py", line 482, in parse_nanopolish counter) File "/home/apps/CHEUI/scripts/CHEUI_preprocess_m5C.py", line 160, in _parse_kmers samples = [float(i) for i in checked_line[samples_idx].split(',')] File "/home/apps/CHEUI/scripts/CHEUI_preprocess_m5C.py", line 160, in samples = [float(i) for i in checked_line[samples_idx].split(',')] ValueError: could not convert string to float: '78.[post-run summary] total reads: 384066' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/apps/CHEUI/scripts/CHEUI_preprocess_m5C.py", line 560, in p.map(parse_nanopolish, pathlist) File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get()

aman21392 avatar Apr 29 '24 05:04 aman21392

Hi,

Sorry about the issue. Can you please share few lines of the nanopolish input file you used? Also, we recommend using the C++ version for faster preprocessing.

Thanks, Akanksha

Akanksha2511 avatar Apr 29 '24 23:04 Akanksha2511

here is the input file for your reference: contig position reference_kmer read_name strand event_index event_level_mean event_stdv event_length model_kmer model_ mean model_stdv standardized_level start_idx end_idx samples gene1 452 TTTAA 3a40d16c-e4fe-4e28-bc3c-7d31eabfdeb7 t 2 81.02 1.173 0.00398 TTTAA 84.44 2.46 -1.18 20956 20968 81.951 ,82.6334,82.0875,79.085,79.9039,80.3133,81.2686,82.0875,80.1768,79.2215,81.4051,82.0875 gene1 452 TTTAA 3a40d16c-e4fe-4e28-bc3c-7d31eabfdeb7 t 3 83.74 1.112 0.00498 TTTAA 84.44 2.46 -0.24 20941 20956 83.861 7,81.8145,84.1346,81.8145,82.9063,85.0899,83.8617,82.3604,83.4522,83.1793,84.1346,85.0899,84.544,85.4994,84.4076 gene1 452 TTTAA 3a40d16c-e4fe-4e28-bc3c-7d31eabfdeb7 t 4 85.62 1.945 0.02590 TTTAA 84.44 2.46 0.41 20863 20941 87.273 6,84.817,86.5912,87.1371,86.1817,86.1817,84.1346,85.6358,85.6358,85.6358,85.7723,85.6358,85.9088,85.6358,87.0006,88.5018,86.0453,85.2264,86.4547,86.5912,84.95 35,84.9535,85.4994,84.4076,86.3182,94.9162,82.7699,85.9088,83.3158,83.5887,87.0006,88.2289,86.1817,85.9088,76.0826,84.6805,85.7723,85.2264,86.3182,84.1346,86. 7276,84.9535,83.9981,85.2264,83.8617,85.4994,85.4994,83.7252,83.9981,83.5887,85.4994,86.5912,84.4076,84.1346,86.8641,84.9535,83.8617,83.9981,87.9559,85.9088,8 5.9088,86.5912,84.6805,86.8641,86.5912,85.7723,85.4994,86.8641,85.7723,85.9088,85.0899,87.5465,86.4547,88.6383,86.0453,83.1793,86.4547,85.2264 gene1 452 TTTAA 3a40d16c-e4fe-4e28-bc3c-7d31eabfdeb7 t 5 84.54 0.925 0.00232 TTTAA 84.44 2.46 0.04 20856 20863 85.362 9,83.9981,82.9063,85.6358,83.8617,84.6805,85.3629 gene1 453 TTAAA 3a40d16c-e4fe-4e28-bc3c-7d31eabfdeb7 t 6 94.67 3.645 0.01029 TTAAA 92.05 3.04 0.73 20825 20856 95.735 ,95.735,93.4149,93.5514,94.0973,95.735,96.5539,97.3727,95.735,96.8268,98.1916,95.8715,95.3256,95.3256,98.1916,76.492,94.5067,93.9609,91.9137,95.4621,94.3703,9 5.4621,96.6904,95.1891,92.869,95.1891,97.7822,92.5961,94.7797,95.0527,94.7797 gene1 453 TTAAA 3a40d16c-e4fe-4e28-bc3c-7d31eabfdeb7 t 7 96.60 1.775 0.00365 TTAAA 92.05 3.04 1.27 20814 20825 95.052 7,95.0527,94.7797,95.735,98.874,98.3281,95.4621,94.0973,98.601,98.601,98.0551 gene1 454 TAAAT 3a40d16c-e4fe-4e28-bc3c-7d31eabfdeb7 t 8 97.39 1.404 0.00432 TAAAT 108.51 2.68 -3.53 20801 20814 97.509 2,96.1445,97.7822,95.4621,95.1891,97.2363,99.4199,99.0104,97.0998,98.0551,96.9633,96.2809,99.9658 gene1 455 AAATG 3a40d16c-e4fe-4e28-bc3c-7d31eabfdeb7 t 9 110.97 3.286 0.02722 AAATG 110.67 3.11 0.08 20719 20801 104.06 ,97.2363,114.705,108.427,111.839,113.886,112.112,115.66,110.747,113.75,113.477,112.931,115.933,113.75,111.293,111.976,111.976,112.522,110.201,101.058,113.067, 105.971,112.931,111.43,116.206,102.695,116.889,110.611,108.837,108.837,115.66,113.75,111.703,108.837,113.204,114.705,110.065,111.703,109.519,109.519,103.378,1 13.34,112.385,107.881,114.159,111.839,113.477,109.11,111.02,110.747,112.794,114.296,111.43,108.973,110.065,111.293,109.792,110.884,112.249,111.43,113.75,112.7 94,112.249,108.291,114.023,110.474,110.884,113.204,106.653,114.432,110.884,112.658,109.792,109.792,109.656,109.519,109.656,109.792,110.338,109.519,108.291,108 .837 gene1 456 AATGC 3a40d16c-e4fe-4e28-bc3c-7d31eabfdeb7 t 10 80.66 7.228 0.00764 AATGC 81.80 3.87 -0.25 20696 20719 72.807 1,78.9485,75.6731,80.7227,72.5342,69.1223,78.8121,86.7276,69.6682,79.7674,75.9461,94.2338,77.0379,81.1321,79.6309,75.8096,88.7748,82.3604,84.2711,99.8293,85.4 994,79.4944,86.4547 gene1 456 AATGC 3a40d16c-e4fe-4e28-bc3c-7d31eabfdeb7 t 11 82.82 1.951 0.00432 AATGC 81.80 3.87 0.22 20683 20696 82.769 9,85.9088,81.5416,85.0899,82.9063,85.4994,83.7252,82.6334,81.2686,79.2215,83.8617,79.9039,82.3604 gene1 456 AATGC 3a40d16c-e4fe-4e28-bc3c-7d31eabfdeb7 t 12 79.87 3.572 0.00299 AATGC 81.80 3.87 -0.42 20674 20683 78.539 1,86.0453,73.7625,79.9039,77.3108,82.7699,76.3555,81.5416,82.6334 gene1 457 ATGCA 3a40d16c-e4fe-4e28-bc3c-7d31eabfdeb7 t 13 84.56 2.839 0.00598 ATGCA 84.45 3.13 0.03 20656 20674 85.908 8,83.3158,80.9957,89.3207,89.5936,84.6805,87.5465,83.4522,77.3108,85.0899,84.1346,86.3182,81.951,86.1817,83.7252,82.4969,84.817,85.2264 gene1 457 ATGCA 3a40d16c-e4fe-4e28-bc3c-7d31eabfdeb7 t 14 82.56 3.086 0.01029 ATGCA 84.45 3.13 -0.51 20625 20656 80.449 8,84.2711,84.6805,85.6358,79.7674,79.4944,76.492,77.4473,83.3158,80.7227,85.4994,86.1817,83.0428,84.6805,81.5416,82.3604,75.9461,86.1817,82.0875,82.2239,84.27 11,79.7674,86.5912,90.276,84.817,81.6781,82.7699,84.4076,81.6781,80.1768,80.8592 gene1 457 ATGCA 3a40d16c-e4fe-4e28-bc3c-7d31eabfdeb7 t 15 87.31 2.366 0.00332 ATGCA 84.45 3.13 0.78 20615 20625 88.365 3,89.1842,91.7772,88.2289,83.8617,85.2264,87.0006,83.7252,87.2736,88.5018 gene1 457 ATGCA 3a40d16c-e4fe-4e28-bc3c-7d31eabfdeb7 t 16 79.71 3.939 0.00232 ATGCA 84.45 3.13 -1.29 20608 20615 72.124 --More--(0%) Thanks in advance for resolving the issue.

aman21392 avatar Apr 30 '24 00:04 aman21392

Thanks, the format for nanopolish file looks ok. Can you please try 2 things:

  1. run the preprocessing code on the test file (https://github.com/comprna/CHEUI/blob/master/test/nanopolish_output_test.txt) to see if the installation is correct.
  2. Can you please try it with the C++ version of preprocessing? Otherwise, we might have to find a way to share the nanopolish file to debug. Thanks, Akanksha

Akanksha2511 avatar Apr 30 '24 05:04 Akanksha2511

1- run the preprocessing code on the test file: The test run was completely successful. No error occurs. I used the command: nohup ./CHEUI -i /home/aclab/apps/CHEUI/test/nanopolish_output_test.txt -o /home/aclab/apps/CHEUI/scripts/preprocessing_CPP/test/out_C_signals+IDs.p/ -m /home/aclab/apps/CHEUI/kmer_models/model_kmer.csv -n 50 --m5C &

2- I try the C++ version of preprocessing: nohup ./CHEUI -i /Drive7/nanopolish_out.txt -o /Drive7/out_C_signals+IDs.p/ -m /home/aclab/apps/CHEUI/kmer_models/model_kmer.csv -n 50 --m5C &

But it also gives following error: terminate called after throwing an instance of 'std::invalid_argument' what(): stof

"please sort out this issue if possible, I want to run this pipeline. Thanks in advance

aman21392 avatar Apr 30 '24 15:04 aman21392

Thank you so much for running the 2 tests. Do you get the error message after some temp files are created or its straight away? We can look at our end if you are happy to share the nanopolish file. I know it can be huge though. Email:[email protected] Thanks, Akanksha

Akanksha2511 avatar May 01 '24 06:05 Akanksha2511

Do you get the error message after some temp files are created or is it straight away? The temp file(51 in number) was made and also main output file(nanopolish_out_signals+IDS.p) has some data. Here below the command is processed before getting error happens: 2500000 processed lines 3500000 processed lines 4000000 processed lines 5500000 processed lines 500000 processed lines 500000 processed lines 1000000 processed lines 5000000 processed lines 2000000 processed lines 3500000 processed lines 3500000 processed lines 5000000 processed lines 6000000 processed lines 6500000 processed lines 2000000 processed lines 2500000 processed lines 3500000 processed lines 1000000 processed lines 1500000 processed lines 2000000 processed lines 5500000 processed lines 6000000 processed lines 500000 processed lines 5000000 processed lines 5500000 processed lines 6000000 processed lines 7000000 processed lines 2000000 processed lines 2500000 processed lines 5500000 processed lines 6500000 processed lines 2000000 processed lines 2500000 processed lines 3500000 processed lines 5500000 processed lines 7000000 processed lines 1000000 processed lines 2000000 processed lines 3500000 processed lines 4500000 processed lines 6000000 processed lines 2000000 processed lines 2500000 processed lines 3000000 processed lines 4000000 processed lines 1500000 processed lines 2000000 processed lines 4000000 processed lines 4500000 processed lines 7000000 processed lines 500000 processed lines 1000000 processed lines 3000000 processed lines 3500000 processed lines 5500000 processed lines 6000000 processed lines 1000000 processed lines 1500000 processed lines 2000000 processed lines 2500000 processed lines 3000000 processed lines 6500000 processed lines 500000 processed lines 1000000 processed lines 3000000 processed lines 3500000 processed lines 7000000 processed lines 1500000 processed lines 2000000 processed lines 6000000 processed lines 4500000 processed lines 5000000 processed lines 5500000 processed lines 6000000 processed lines 7000000 processed lines 500000 processed lines 1500000 processed lines 2000000 processed lines 4000000 processed lines 7000000 processed lines 1500000 processed lines 3000000 processed lines 6000000 processed lines 1000000 processed lines 2000000 processed lines 4500000 processed lines 5500000 processed lines 6000000 processed lines 500000 processed lines 2500000 processed lines 5000000 processed lines 500000 processed lines 1000000 processed lines 2500000 processed lines 3500000 processed lines 5000000 processed lines 5000000 processed lines 6000000 processed lines 4000000 processed lines 5000000 processed lines 7000000 processed lines 1500000 processed lines 2500000 processed lines 3500000 processed lines 4000000 processed lines 6000000 processed lines 6500000 processed lines 7000000 processed lines 1500000 processed lines 2000000 processed lines 4500000 processed lines 6500000 processed lines 1500000 processed lines 2000000 processed lines 2500000 processed lines 3500000 processed lines 4000000 processed lines 5000000 processed lines 5500000 processed lines 6500000 processed lines 2500000 processed lines 3500000 processed lines 4000000 processed lines 6500000 processed lines 1000000 processed lines 1500000 processed lines 2500000 processed lines 4000000 processed lines 6500000 processed lines 7000000 processed lines 500000 processed lines 1500000 processed lines 2500000 processed lines 7000000 processed lines 500000 processed lines 2000000 processed lines 2500000 processed lines 5000000 processed lines 5500000 processed lines 5500000 processed lines 6500000 processed lines 2500000 processed lines 3000000 processed lines 3500000 processed lines 4500000 processed lines 5000000 processed lines 6000000 processed lines 1500000 processed lines 4000000 processed lines 6500000 processed lines 2500000 processed lines 6000000 processed lines 7000000 processed lines 500000 processed lines 2000000 processed lines 3000000 processed lines 4000000 processed lines 7000000 processed lines 1500000 processed lines 3000000 processed lines 3500000 processed lines 5000000 processed lines 1000000 processed lines 1500000 processed lines 2000000 processed lines 3500000 processed lines 4500000 processed lines 5000000 processed lines 6000000 processed lines 1500000 processed lines 1500000 processed lines 2500000 processed lines 3000000 processed lines 4000000 processed lines 4500000 processed lines 5000000 processed lines 5500000 processed lines 7000000 processed lines 500000 processed lines 1000000 processed lines 2500000 processed lines 3000000 processed lines 3500000 processed lines 6000000 processed lines 6500000 processed lines 7000000 processed lines 500000 processed lines 4000000 processed lines 4500000 processed lines 5000000 processed lines 5500000 processed lines 6000000 processed lines 7000000 processed lines 3000000 processed lines 3500000 processed lines 5000000 processed lines 5500000 processed lines 6500000 processed lines 500000 processed lines 1500000 processed lines 2000000 processed lines 2500000 processed lines 3000000 processed lines 3500000 processed lines 4500000 processed lines 5500000 processed lines 6000000 processed lines 6500000 processed lines 7000000 processed lines 500000 processed lines 1500000 processed lines 2000000 processed lines 2500000 processed lines 3000000 processed lines 3500000 processed lines 4000000 processed lines 5000000 processed lines 6000000 processed lines 500000 processed lines 1500000 processed lines 3500000 processed lines 500000 processed lines 1000000 processed lines 1500000 processed lines 1500000 processed lines 3500000 processed lines 4000000 processed lines 5500000 processed lines 6500000 processed lines 7000000 processed lines 500000 processed lines 1000000 processed lines 1500000 processed lines 2000000 processed lines 3000000 processed lines 5000000 processed lines 5500000 processed lines 6000000 processed lines 500000 processed lines 2000000 processed lines 3000000 processed lines 3500000 processed lines 4000000 processed lines 6000000 processed lines 6500000 processed lines 1000000 processed lines 1500000 processed lines 2500000 processed lines 3000000 processed lines 5500000 processed lines 500000 processed lines 1500000 processed lines 4500000 processed lines 6000000 processed lines 6500000 processed lines 7000000 processed lines 1000000 processed lines 1500000 processed lines 4500000 processed lines 6000000 processed lines 500000 processed lines 1000000 processed lines 1500000 processed lines 3000000 processed lines 3500000 processed lines 5500000 processed lines 6500000 processed lines 1500000 processed lines 7000000 processed lines 500000 processed lines 2500000 processed lines 3000000 processed lines 3500000 processed lines 6500000 processed lines 500000 processed lines 1000000 processed lines 4000000 processed lines 6000000 processed lines 2500000 processed lines 3500000 processed lines 4000000 processed lines 5000000 processed lines 5500000 processed lines 6500000 processed lines 1000000 processed lines 1500000 processed lines 2000000 processed lines 4000000 processed lines 5500000 processed lines 5500000 processed lines 6500000 processed lines 7000000 processed lines 500000 processed lines 1000000 processed lines 1500000 processed lines 3000000 processed lines 3500000 processed lines 4000000 processed lines 5000000 processed lines 5500000 processed lines 6500000 processed lines 7000000 processed lines 500000 processed lines 1000000 processed lines 4000000 processed lines 4500000 processed lines 5000000 processed lines 2000000 processed lines 2500000 processed lines 3000000 processed lines 4000000 processed lines 4500000 processed lines 6000000 processed lines 7000000 processed lines 500000 processed lines 1000000 processed lines 2000000 processed lines 2500000 processed lines 3000000 processed lines 5500000 processed lines 6500000 processed lines 3500000 processed lines 4000000 processed lines 5500000 processed lines 6000000 processed lines 6500000 processed lines 1000000 processed lines 2000000 processed lines 2500000 processed lines 6000000 processed lines 6500000 processed lines 2500000 processed lines 3500000 processed lines 5000000 processed lines 5500000 processed lines 2500000 processed lines 3000000 processed lines 3500000 processed lines 5000000 processed lines 5500000 processed lines 6000000 processed lines 6000000 processed lines 6500000 processed lines 7000000 processed lines 1500000 processed lines 2000000 processed lines 3000000 processed lines 3500000 processed lines 5500000 processed lines 6500000 processed lines 7000000 processed lines 500000 processed lines 4500000 processed lines 5000000 processed lines 6000000 processed lines 7000000 processed lines 1000000 processed lines 2000000 processed lines 3500000 processed lines 4000000 processed lines 500000 processed lines 1000000 processed lines 1500000 processed lines 2000000 processed lines 2500000 processed lines 6000000 processed lines 6500000 processed lines 2000000 processed lines 2500000 processed lines 3000000 processed lines 3500000 processed lines 4000000 processed lines 4500000 processed lines 5000000 processed lines 6000000 processed lines 500000 processed lines 3000000 processed lines 4500000 processed lines 5500000 processed lines 6500000 processed lines 7000000 processed lines 500000 processed lines 1000000 processed lines 4000000 processed lines 4500000 processed lines 7000000 processed lines multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/home/aclab/apps/CHEUI/scripts/CHEUI_preprocess_m5C.py", line 482, in parse_nanopolish counter) File "/home/aclab/apps/CHEUI/scripts/CHEUI_preprocess_m5C.py", line 160, in _parse_kmers samples = [float(i) for i in checked_line[samples_idx].split(',')] File "/home/aclab/apps/CHEUI/scripts/CHEUI_preprocess_m5C.py", line 160, in samples = [float(i) for i in checked_line[samples_idx].split(',')] ValueError: could not convert string to float: '78.[post-run summary] total reads: 384066' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/aclab/apps/CHEUI/scripts/CHEUI_preprocess_m5C.py", line 560, in p.map(parse_nanopolish, pathlist) File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value ValueError: could not convert string to float: '78.[post-run summary] total reads: 384066'

aman21392 avatar May 02 '24 06:05 aman21392

Thanks. Just wondering is it possible to run the script without multiprocessing (n=1).

Also can you please share the number of lines in your nanopolish file.

Thanks, Akanksha

Akanksha2511 avatar May 02 '24 07:05 Akanksha2511

the number of lines in my nanopolish file: 254747835.

aman21392 avatar May 02 '24 09:05 aman21392

As you suggested I ran the the script with n=1 but it also not completed: 500000 processed lines 4000000 processed lines 4500000 processed lines 5000000 processed lines 5500000 processed lines 6000000 processed lines 7000000 processed lines 7500000 processed lines 8000000 processed lines 9500000 processed lines 13000000 processed lines 14000000 processed lines 15000000 processed lines 15500000 processed lines 16000000 processed lines 16500000 processed lines 17000000 processed lines 18000000 processed lines 19500000 processed lines 21500000 processed lines 22000000 processed lines 23000000 processed lines 24000000 processed lines 25500000 processed lines 27000000 processed lines 28500000 processed lines 29000000 processed lines 29500000 processed lines 30000000 processed lines 31000000 processed lines 31500000 processed lines 32000000 processed lines 34000000 processed lines 34500000 processed lines 35000000 processed lines 37500000 processed lines 38000000 processed lines 39000000 processed lines 40000000 processed lines 41500000 processed lines 42500000 processed lines 43500000 processed lines 45000000 processed lines 47000000 processed lines 48000000 processed lines 48500000 processed lines 49000000 processed lines 49500000 processed lines 51000000 processed lines 54000000 processed lines 54500000 processed lines 55000000 processed lines 56000000 processed lines 56500000 processed lines 57500000 processed lines 60500000 processed lines 64000000 processed lines 64500000 processed lines 65500000 processed lines 66500000 processed lines 67000000 processed lines 68000000 processed lines 68500000 processed lines 70000000 processed lines 70500000 processed lines 71000000 processed lines 74000000 processed lines 76000000 processed lines 76500000 processed lines 77000000 processed lines 77500000 processed lines 78500000 processed lines 80000000 processed lines 80500000 processed lines 81500000 processed lines 83500000 processed lines 84500000 processed lines 85000000 processed lines 85500000 processed lines 86000000 processed lines 86500000 processed lines 87500000 processed lines 89500000 processed lines 90000000 processed lines 90500000 processed lines 94000000 processed lines 96000000 processed lines 97500000 processed lines 98000000 processed lines 98500000 processed lines 100000000 processed lines 102000000 processed lines 103000000 processed lines 104000000 processed lines 106500000 processed lines 107500000 processed lines 109000000 processed lines 110000000 processed lines 110500000 processed lines 112000000 processed lines 112500000 processed lines 113500000 processed lines 114500000 processed lines 115500000 processed lines 116500000 processed lines 117000000 processed lines 118500000 processed lines 119000000 processed lines 121000000 processed lines 122500000 processed lines 123000000 processed lines 124500000 processed lines 125500000 processed lines 126500000 processed lines 127000000 processed lines 127500000 processed lines 128500000 processed lines 130000000 processed lines 130500000 processed lines 133000000 processed lines 134000000 processed lines 136000000 processed lines 138500000 processed lines 139000000 processed lines 139500000 processed lines 140500000 processed lines 141500000 processed lines 142000000 processed lines 143500000 processed lines 144500000 processed lines 145000000 processed lines 145500000 processed lines 148500000 processed lines 149000000 processed lines 150000000 processed lines 151000000 processed lines 151500000 processed lines 154000000 processed lines 154500000 processed lines 155000000 processed lines 155500000 processed lines 157000000 processed lines 157500000 processed lines 160000000 processed lines 160500000 processed lines 162000000 processed lines 167000000 processed lines 167500000 processed lines 169000000 processed lines 172000000 processed lines 172500000 processed lines 173000000 processed lines 174500000 processed lines 175000000 processed lines 175500000 processed lines 178000000 processed lines 179000000 processed lines 180000000 processed lines 182500000 processed lines 184000000 processed lines 185000000 processed lines 188000000 processed lines 190500000 processed lines 190500000 processed lines 191500000 processed lines 192000000 processed lines 192500000 processed lines 195000000 processed lines 196000000 processed lines 197000000 processed lines 197500000 processed lines 198000000 processed lines 199500000 processed lines 200500000 processed lines 201500000 processed lines 202500000 processed lines 202500000 processed lines 205500000 processed lines 208000000 processed lines 211500000 processed lines 214000000 processed lines 214500000 processed lines 216000000 processed lines 216500000 processed lines 219000000 processed lines 220000000 processed lines 222500000 processed lines 223500000 processed lines 224500000 processed lines 225000000 processed lines 226000000 processed lines 226500000 processed lines 227500000 processed lines 229000000 processed lines 231500000 processed lines 233000000 processed lines 233500000 processed lines 234000000 processed lines 235000000 processed lines 235500000 processed lines 239000000 processed lines 239500000 processed lines 240500000 processed lines 241000000 processed lines 242000000 processed lines 244000000 processed lines 245000000 processed lines 246000000 processed lines 246500000 processed lines 247000000 processed lines 248000000 processed lines 248000000 processed lines 248500000 processed lines 249500000 processed lines 250000000 processed lines 251000000 processed lines 252000000 processed lines 253000000 processed lines 253500000 processed lines Traceback (most recent call last): File "/home/aclab/apps/CHEUI/scripts/CHEUI_preprocess_m5C.py", line 555, in parse_nanopolish(nanopolish_path) File "/home/aclab/apps/CHEUI/scripts/CHEUI_preprocess_m5C.py", line 482, in parse_nanopolish counter) File "/home/aclab/apps/CHEUI/scripts/CHEUI_preprocess_m5C.py", line 160, in _parse_kmers samples = [float(i) for i in checked_line[samples_idx].split(',')] File "/home/aclab/apps/CHEUI/scripts/CHEUI_preprocess_m5C.py", line 160, in samples = [float(i) for i in checked_line[samples_idx].split(',')] ValueError: could not convert string to float: '78.[post-run summary] total reads: 384066'

aman21392 avatar May 03 '24 06:05 aman21392

I run the command in nohup. I don't think its the issue. Is my nanopolish file is too large for the command running.

aman21392 avatar May 03 '24 06:05 aman21392

Thanks. Yes running in background should be fine. I think it gets stuck somewhere between line 253500000 to end of your file which is 254747839. Can you please cp the lines from 253000000 till end of your file into another file and then share with me. It seems like either the nanopolish file is corrupt or the preprocessing script is missing a check to handle the special case that might be in thenano polish file.

Akanksha2511 avatar May 03 '24 07:05 Akanksha2511

Also if you can share the output file that you got from the above run might be useful.

Thanks, Akanksha

Akanksha2511 avatar May 03 '24 09:05 Akanksha2511

Could you please take a moment to review the nanopolish out file I shared on May 4th? I'd appreciate it if you could identify any issues or problems. Thank you! in advance.

aman21392 avatar May 08 '24 06:05 aman21392

Hi, Sorry about the delay in reply. So if you look at last two lines of your file they look weird.

1747834	gene9	616.0	GCTGA	c9ee63a7-28b7-40a9-a743-c0eccfba7f73	t	127.0	81.03	2.141	0.00996	GCTGA	89.96	2.85	-2.77	27749.0	27779.0	81.3491,85.069,81.2113,80.2469,79.1447,78.0425,78.4558,82.4513,80.1091,79.4202,80.3847,80.2469,82.0379,82.4513,75.9759,82.0379,79.9713,83.829,82.0379,87.5489,78.[post-run summary] total reads: 384066, unparseable: 0, qc fail: 6183, could not calibrate: 415, no alignment: 266, bad fast5: 0
1747835	0425,79.558,82.1757,80.9358,80.798,80.3847,81.9002,82.1757,81.0735,81.7624	
```	
That's where the preprocessing code will get stuck. If you remove these two lines. The issue should be resolved. 

Also, If you already have the output from the preprocessing code I think it should be fine to run the next steps.

Thanks, 
Akanskha													

Akanksha2511 avatar May 09 '24 04:05 Akanksha2511

when I run the CHEUI model 1 code in nohup command then when i open nohup file then it shows this message. I want to know my file is run completely or not: This message occur in both command for m6A and m5C. Ran out of input All signals have been processed 44982579

whichever file is made in read level detection code, i procees it for site level detection:
The below output file for m6A site detection level. I want to understand the output file of the CHEUI model 2 code because if the second column is my m6A site like in 1st row it says 1495 is m6A site but at my sequence the 1495 doesn't contain m6A sites. likewise 1541 position is also don't contain m6A sites.

transcript.fa	1495	TAGCAGGAA	161	0.30434782608695654	0.30759284
transcript.fa	1502	AACTACTAG	167	0.10256410256410256	0.12798941
transcript.fa	1505	TACTAGTAC	158	0.36904761904761907	0.43889725
transcript.fa	1508	TAGTACCCT	149	0.16393442622950818	0.39143988
transcript.fa	1522	AACAAATAG	108	0.1368421052631579	0.2662186
transcript.fa	1523	ACAAATAGG	108	0.13402061855670103	0.49044427
transcript.fa	1525	AAATAGGAT	112	0.1625	0.4378153
transcript.fa	1539	ACACATAAT	161	0.547945205479452	0.5977786
transcript.fa	1541	ACATAATCC	156	0.3389830508474576	0.86891454
transcript.fa	1542	CATAATCCA	153	0.23622047244094488	0.74543554
transcript.fa	1546	ATCCACCTA	157	0.40963855421686746	0.49316177
transcript.fa	1555	TCCCAGTAG	134	0.17708333333333334	0.18778525
transcript.fa	1558	CAGTAGGAG	131	0.2948717948717949	0.54169106

Thanks in advance

aman21392 avatar May 20 '24 05:05 aman21392

please can you resolve this issue thanks in advance

aman21392 avatar May 27 '24 13:05 aman21392

Hi, If it says "All signals have been processed" that means its complete. To get the position of the center nucleotide you need to add +5. The position column gives the position of the first nucleotide of the 9mer sequence. I hope it helps.

Thanks, Akanksha

Akanksha2511 avatar May 28 '24 00:05 Akanksha2511