taiyaki
taiyaki copied to clipboard
AssertionError: Input not finite
Hi,
we have recently succesfully trained a model for a plant species sequenced on the MinION using R9.4 flowcell. We have also sequenced the same plant species on the MinION on R10.3 flowcell and scussefully trained a model with those data.
We now have sequenced the same plant (again) on PromethION R10.4 flowcell, but are running into an error when attempting to train a model:
* Taiyaki version 5.1.0
* Platform is Linux-4.15.0-38-generic-x86_64-with-debian-buster-sid
* PyTorch version 1.2.0
* CUDA version 10.0.130 on device GeForce GTX 1080 Ti
* Command line:
* "/opt/kgapps/taiyaki/bin/train_flipflop.py resume2/model_checkpoint_00018.checkpoint mapped_reads_2.hdf5 --min_sub_batch_size 48 --outdir resume3 --lr_max 0.00160 --niteration 40000 --lr_cosine_iters 30000 --overwrite --device 0
* Started on 2020-09-25 08:40:06.741154
* Loading data from mapped_reads_2.hdf5
* Per read file MD5 62e8f6baab6b7ca1d1c046bdaed7e933
* Reads not filtered by id
* Using alphabet definition: canonical alphabet ACGT and no modified bases
* Loaded 14191 reads.
* Reading network from resume2/model_checkpoint_00018.checkpoint
* Network has 10683280 parameters.
* Loaded standard (canonical bases-only) model.
* Dumping initial model
* Sampled 100000 chunks: median(mean_dwell)=9.20, mad(mean_dwell)=0.89
* Learning rate goes like cosine from lr_max to lr_min over 30000.0 iterations.
* At start, train for 200 batches at warm-up learning rate 0.0001
* Standard loss reporting from 141 validation reads held out of training.
* Standard loss report: chunk length = 5500 & sub-batch size = 48 for 10 sub-batches.
* Gradient L2 norm cap will be upper 0.05 quantile of the last 100 norms.
* Training
.................................................. 1 0.10118 0.10477 116.30s (164.95 ksample/s 18.39 kbase/s) lr=1.00e-04 22.8% chunks filtered
.................................................. 2 0.10152 0.10432 116.79s (164.35 ksample/s 18.31 kbase/s) lr=1.00e-04 23.1% chunks filtered
.................................................. 3 0.10123 0.10401 110.55s (173.69 ksample/s 19.35 kbase/s) lr=1.00e-04 22.4% chunks filtered
.................................................. 4 0.10249 0.10369 117.33s (163.65 ksample/s 18.22 kbase/s) lr=1.00e-04 22.5% chunks filtered
............................Traceback (most recent call last):
File "/opt/kgapps/taiyaki/bin/train_flipflop.py", line 4, in <module>
__import__('pkg_resources').run_script('taiyaki==5.1.0', 'train_flipflop.py')
File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1446, in run_script
exec(code, namespace, namespace)
File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/taiyaki-5.1.0-py3.7-linux-x86_64.egg/EGG-INFO/scripts/train_flipflop.py", line 624, in <module>
main()
File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/taiyaki-5.1.0-py3.7-linux-x86_64.egg/EGG-INFO/scripts/train_flipflop.py", line 541, in main
mod_factor_t, calc_grads = True )
File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/taiyaki-5.1.0-py3.7-linux-x86_64.egg/EGG-INFO/scripts/train_flipflop.py", line 247, in calculate_loss
outputs, seqs, seqlens, sharpen)
File "taiyaki/ctc/ctc.pyx", line 88, in taiyaki.ctc.ctc.FlipFlopCRF.forward
File "taiyaki/ctc/ctc.pyx", line 62, in taiyaki.ctc.ctc.crf_flipflop_grad
AssertionError: Input not finite
If we resume from the checkpoint, we run into the same error sometime later:
* Taiyaki version 5.1.0
* Platform is Linux-4.15.0-38-generic-x86_64-with-debian-buster-sid
* PyTorch version 1.2.0
* CUDA version 10.0.130 on device GeForce GTX 1080 Ti
* Command line:
* "/opt/kgapps/taiyaki/bin/train_flipflop.py resume2/model_checkpoint_00018.checkpoint mapped_reads_2.hdf5 --min_sub_batch_size 48 --outdir resume3 --lr_max 0.00160 --niteration 40000 --lr_cosine_iters 30000 --overwrite --device 0
* Started on 2020-09-29 08:43:32.712279
* Loading data from mapped_reads_2.hdf5
* Per read file MD5 62e8f6baab6b7ca1d1c046bdaed7e933
* Reads not filtered by id
* Using alphabet definition: canonical alphabet ACGT and no modified bases
* Loaded 14191 reads.
* Reading network from resume2/model_checkpoint_00018.checkpoint
* Network has 10683280 parameters.
* Loaded standard (canonical bases-only) model.
* Dumping initial model
* Sampled 100000 chunks: median(mean_dwell)=9.20, mad(mean_dwell)=0.89
* Learning rate goes like cosine from lr_max to lr_min over 30000.0 iterations.
* At start, train for 200 batches at warm-up learning rate 0.0001
* Standard loss reporting from 141 validation reads held out of training.
* Standard loss report: chunk length = 5500 & sub-batch size = 48 for 10 sub-batches.
* Gradient L2 norm cap will be upper 0.05 quantile of the last 100 norms.
* Training
.................................................. 1 0.10303 0.09424 114.06s (168.34 ksample/s 18.76 kbase/s) lr=1.00e-04 23.7% chunks filtered
.................................................. 2 0.10134 0.09381 115.19s (166.80 ksample/s 18.57 kbase/s) lr=1.00e-04 23.0% chunks filtered
.................................................. 3 0.10066 0.09348 121.96s (157.53 ksample/s 17.53 kbase/s) lr=1.00e-04 23.4% chunks filtered
.................................................. 4 0.10001 0.09326 115.78s (165.98 ksample/s 18.45 kbase/s) lr=1.00e-04 23.1% chunks filtered
.................................................. 5 0.10257 0.09576 112.31s (170.96 ksample/s 19.06 kbase/s) lr=1.60e-03 22.9% chunks filtered
.................................................. 6 0.10423 0.09548 112.72s (170.15 ksample/s 18.97 kbase/s) lr=1.60e-03 22.6% chunks filtered
.................................................. 7 0.10399 0.09554 116.29s (165.05 ksample/s 18.44 kbase/s) lr=1.60e-03 22.5% chunks filtered
.................................................. 8 0.10446 0.09625 115.58s (166.09 ksample/s 18.52 kbase/s) lr=1.60e-03 22.6% chunks filtered
.................................................. 9 0.10420 0.09789 115.14s (166.58 ksample/s 18.57 kbase/s) lr=1.60e-03 22.7% chunks filtered
.................................................. 10 0.10316 0.09563 111.75s (171.81 ksample/s 19.07 kbase/s) lr=1.60e-03 22.6% chunks filtered
.................................................. 11 0.10285 0.09823 113.14s (169.61 ksample/s 18.91 kbase/s) lr=1.60e-03 22.6% chunks filtered
.................................................. 12 0.10350 0.09618 113.01s (169.96 ksample/s 18.88 kbase/s) lr=1.60e-03 22.5% chunks filtered
.................................................. 13 0.10436 0.09668 118.44s (162.04 ksample/s 18.04 kbase/s) lr=1.60e-03 22.4% chunks filtered
.................................................. 14 0.10136 0.09609 120.23s (159.74 ksample/s 17.71 kbase/s) lr=1.60e-03 22.5% chunks filtered
.................................................. 15 0.10160 0.09668 114.07s (168.49 ksample/s 18.78 kbase/s) lr=1.60e-03 22.3% chunks filtered
.................................................. 16 0.10144 0.09606 121.89s (157.41 ksample/s 17.53 kbase/s) lr=1.60e-03 22.4% chunks filtered
.................................................. 17 0.10180 0.09654 113.69s (168.87 ksample/s 18.75 kbase/s) lr=1.60e-03 22.4% chunks filtered
.................................................. 18 0.10381 0.09578 114.91s (166.92 ksample/s 18.63 kbase/s) lr=1.60e-03 22.3% chunks filtered
.................................................. 19 0.10369 0.09621 119.26s (161.01 ksample/s 17.88 kbase/s) lr=1.60e-03 22.4% chunks filtered
.................................................C 20 0.10542 0.09585 116.87s (164.28 ksample/s 18.32 kbase/s) lr=1.60e-03 22.4% chunks filtered
.................................................. 21 0.10229 0.09613 112.86s (170.12 ksample/s 18.89 kbase/s) lr=1.60e-03 22.4% chunks filtered
.................................................. 22 0.10319 0.09585 119.18s (161.01 ksample/s 17.88 kbase/s) lr=1.60e-03 22.4% chunks filtered
.................................................. 23 0.10428 0.09592 112.53s (170.74 ksample/s 19.03 kbase/s) lr=1.60e-03 22.3% chunks filtered
.................................................. 24 0.10430 0.09659 118.36s (162.13 ksample/s 18.06 kbase/s) lr=1.60e-03 22.3% chunks filtered
.................................................. 25 0.10297 0.09839 114.13s (168.21 ksample/s 18.72 kbase/s) lr=1.60e-03 22.3% chunks filtered
.................................................. 26 0.10191 0.09591 119.56s (160.64 ksample/s 17.88 kbase/s) lr=1.60e-03 22.4% chunks filtered
.................................................. 27 0.10201 0.09587 119.00s (161.43 ksample/s 18.00 kbase/s) lr=1.59e-03 22.4% chunks filtered
.................................................. 28 0.10112 0.09673 115.11s (166.81 ksample/s 18.54 kbase/s) lr=1.59e-03 22.3% chunks filtered
.................................................. 29 0.10262 0.09598 116.90s (164.09 ksample/s 18.27 kbase/s) lr=1.59e-03 22.3% chunks filtered
.................................................. 30 0.10112 0.09629 119.62s (160.46 ksample/s 17.91 kbase/s) lr=1.59e-03 22.4% chunks filtered
.................................................. 31 0.10133 0.09634 113.04s (169.94 ksample/s 18.97 kbase/s) lr=1.59e-03 22.4% chunks filtered
.................................................. 32 0.10289 0.09638 114.80s (167.16 ksample/s 18.62 kbase/s) lr=1.59e-03 22.4% chunks filtered
.................................................. 33 0.10126 0.09713 118.53s (162.10 ksample/s 18.09 kbase/s) lr=1.59e-03 22.4% chunks filtered
.................................................. 34 0.10132 0.09618 114.32s (167.75 ksample/s 18.63 kbase/s) lr=1.59e-03 22.4% chunks filtered
.................................................. 35 0.10256 0.09735 114.83s (167.14 ksample/s 18.61 kbase/s) lr=1.59e-03 22.4% chunks filtered
.................................................. 36 0.10273 0.09627 118.25s (162.45 ksample/s 18.04 kbase/s) lr=1.59e-03 22.5% chunks filtered
.................................................. 37 0.10070 0.09690 119.18s (160.95 ksample/s 17.92 kbase/s) lr=1.59e-03 22.5% chunks filtered
.................................................. 38 0.10240 0.09587 122.12s (157.21 ksample/s 17.53 kbase/s) lr=1.59e-03 22.5% chunks filtered
.................................................. 39 0.10228 0.09652 121.98s (157.32 ksample/s 17.50 kbase/s) lr=1.59e-03 22.5% chunks filtered
.................................................C 40 0.10154 0.09753 119.72s (160.40 ksample/s 17.86 kbase/s) lr=1.59e-03 22.5% chunks filtered
.................................................. 41 0.10145 0.09766 116.83s (164.52 ksample/s 18.34 kbase/s) lr=1.59e-03 22.5% chunks filtered
.................................................. 42 0.10305 0.09752 116.33s (165.06 ksample/s 18.37 kbase/s) lr=1.59e-03 22.5% chunks filtered
.................................................. 43 0.10309 0.09718 113.95s (168.32 ksample/s 18.77 kbase/s) lr=1.58e-03 22.5% chunks filtered
.................................................. 44 0.10519 0.09719 117.84s (163.13 ksample/s 18.14 kbase/s) lr=1.58e-03 22.5% chunks filtered
.................................................. 45 0.10280 0.09720 114.28s (167.86 ksample/s 18.66 kbase/s) lr=1.58e-03 22.5% chunks filtered
.................................................. 46 0.10338 0.09686 117.43s (163.33 ksample/s 18.20 kbase/s) lr=1.58e-03 22.5% chunks filtered
.................................................. 47 0.10158 0.09750 117.49s (163.25 ksample/s 18.12 kbase/s) lr=1.58e-03 22.5% chunks filtered
.................................................. 48 0.10328 0.09947 117.06s (163.95 ksample/s 18.26 kbase/s) lr=1.58e-03 22.5% chunks filtered
.................................................. 49 0.10487 0.09910 115.14s (166.81 ksample/s 18.65 kbase/s) lr=1.58e-03 22.6% chunks filtered
.................................................. 50 0.10118 0.09751 120.68s (159.18 ksample/s 17.63 kbase/s) lr=1.58e-03 22.6% chunks filtered
.................................................. 51 0.10375 0.09986 115.22s (166.70 ksample/s 18.58 kbase/s) lr=1.58e-03 22.6% chunks filtered
.................................................. 52 0.10603 0.09855 119.27s (161.08 ksample/s 17.95 kbase/s) lr=1.58e-03 22.6% chunks filtered
.................................................. 53 0.10155 0.09740 116.16s (165.29 ksample/s 18.38 kbase/s) lr=1.58e-03 22.6% chunks filtered
.................................................. 54 0.10260 0.09768 112.45s (170.72 ksample/s 18.98 kbase/s) lr=1.57e-03 22.6% chunks filtered
.................................................. 55 0.10155 0.09779 116.01s (165.47 ksample/s 18.41 kbase/s) lr=1.57e-03 22.6% chunks filtered
.................................................. 56 0.10258 0.09721 117.70s (163.23 ksample/s 18.10 kbase/s) lr=1.57e-03 22.6% chunks filtered
.................................................. 57 0.10468 0.09874 118.00s (162.63 ksample/s 18.08 kbase/s) lr=1.57e-03 22.6% chunks filtered
.................................................. 58 0.10292 0.09673 120.48s (159.47 ksample/s 17.73 kbase/s) lr=1.57e-03 22.6% chunks filtered
.................................................. 59 0.10120 0.09683 116.46s (164.98 ksample/s 18.34 kbase/s) lr=1.57e-03 22.6% chunks filtered
.................................................C 60 0.10291 0.09715 111.07s (172.95 ksample/s 19.24 kbase/s) lr=1.57e-03 22.6% chunks filtered
.................................................. 61 0.10265 0.09764 117.70s (163.35 ksample/s 18.17 kbase/s) lr=1.57e-03 22.6% chunks filtered
.................................................. 62 0.10124 0.09724 118.02s (162.72 ksample/s 18.12 kbase/s) lr=1.57e-03 22.6% chunks filtered
.................................................. 63 0.10444 0.09788 116.20s (165.41 ksample/s 18.37 kbase/s) lr=1.56e-03 22.6% chunks filtered
.................................................. 64 0.10290 0.09740 115.10s (166.85 ksample/s 18.55 kbase/s) lr=1.56e-03 22.6% chunks filtered
.................................................. 65 0.10396 0.09741 116.88s (164.25 ksample/s 18.32 kbase/s) lr=1.56e-03 22.6% chunks filtered
.................................................. 66 0.10418 0.09737 120.62s (159.13 ksample/s 17.71 kbase/s) lr=1.56e-03 22.6% chunks filtered
.................................................. 67 0.10352 0.09803 113.83s (168.62 ksample/s 18.76 kbase/s) lr=1.56e-03 22.6% chunks filtered
.................................................. 68 0.10091 0.09806 114.76s (167.36 ksample/s 18.64 kbase/s) lr=1.56e-03 22.6% chunks filtered
.................................................. 69 0.10277 0.09752 109.89s (174.39 ksample/s 19.44 kbase/s) lr=1.56e-03 22.6% chunks filtered
.................................................. 70 0.10152 0.09761 121.69s (157.95 ksample/s 17.54 kbase/s) lr=1.56e-03 22.6% chunks filtered
.................................................. 71 0.10293 0.09910 112.92s (169.99 ksample/s 18.91 kbase/s) lr=1.55e-03 22.7% chunks filtered
.................................................. 72 0.10342 0.09704 115.96s (165.39 ksample/s 18.41 kbase/s) lr=1.55e-03 22.7% chunks filtered
.................................................. 73 0.10179 0.09838 112.01s (171.42 ksample/s 19.05 kbase/s) lr=1.55e-03 22.6% chunks filtered
.................................................. 74 0.10373 0.09874 111.87s (171.61 ksample/s 19.09 kbase/s) lr=1.55e-03 22.6% chunks filtered
.................................................. 75 0.10389 0.09729 110.51s (173.63 ksample/s 19.31 kbase/s) lr=1.55e-03 22.6% chunks filtered
.................................................. 76 0.10115 0.09802 116.14s (165.44 ksample/s 18.41 kbase/s) lr=1.55e-03 22.6% chunks filtered
.................................................. 77 0.10189 0.09826 121.15s (158.53 ksample/s 17.68 kbase/s) lr=1.55e-03 22.6% chunks filtered
.................................................. 78 0.10195 0.09802 115.58s (165.89 ksample/s 18.42 kbase/s) lr=1.54e-03 22.6% chunks filtered
.................................................. 79 0.10193 0.09749 120.47s (159.53 ksample/s 17.80 kbase/s) lr=1.54e-03 22.6% chunks filtered
.................................................C 80 0.10151 0.09734 121.18s (158.53 ksample/s 17.65 kbase/s) lr=1.54e-03 22.6% chunks filtered
.................................................. 81 0.10123 0.09833 114.94s (166.94 ksample/s 18.59 kbase/s) lr=1.54e-03 22.6% chunks filtered
.................................................. 82 0.10002 0.09774 122.28s (157.09 ksample/s 17.51 kbase/s) lr=1.54e-03 22.6% chunks filtered
.................................................. 83 0.10148 0.09776 120.81s (158.98 ksample/s 17.69 kbase/s) lr=1.54e-03 22.6% chunks filtered
.................................................. 84 0.10369 0.09914 118.36s (162.27 ksample/s 18.08 kbase/s) lr=1.54e-03 22.7% chunks filtered
.................................................. 85 0.10256 0.09831 118.93s (161.22 ksample/s 17.93 kbase/s) lr=1.53e-03 22.7% chunks filtered
.................................................. 86 0.10094 0.09856 117.98s (162.84 ksample/s 18.06 kbase/s) lr=1.53e-03 22.7% chunks filtered
.................................................. 87 0.10262 0.09856 115.67s (165.98 ksample/s 18.48 kbase/s) lr=1.53e-03 22.7% chunks filtered
.................................................. 88 0.10268 0.09754 114.59s (167.54 ksample/s 18.67 kbase/s) lr=1.53e-03 22.6% chunks filtered
.................................................. 89 0.10122 0.09729 115.43s (166.36 ksample/s 18.49 kbase/s) lr=1.53e-03 22.6% chunks filtered
.................................................. 90 0.10119 0.09826 117.85s (162.79 ksample/s 18.10 kbase/s) lr=1.53e-03 22.6% chunks filtered
.................................................. 91 0.10304 0.09846 113.75s (168.66 ksample/s 18.74 kbase/s) lr=1.52e-03 22.6% chunks filtered
.................................................. 92 0.10238 0.09824 112.37s (171.00 ksample/s 19.04 kbase/s) lr=1.52e-03 22.6% chunks filtered
.................................................. 93 0.10246 0.09978 113.78s (168.81 ksample/s 18.80 kbase/s) lr=1.52e-03 22.6% chunks filtered
.................................................. 94 0.10248 0.09794 116.51s (164.81 ksample/s 18.42 kbase/s) lr=1.52e-03 22.6% chunks filtered
.................................................. 95 0.10194 0.09909 116.15s (165.32 ksample/s 18.43 kbase/s) lr=1.52e-03 22.6% chunks filtered
.................................................. 96 0.10665 0.10063 112.00s (171.39 ksample/s 19.09 kbase/s) lr=1.51e-03 22.6% chunks filtered
.................................................. 97 0.10250 0.09883 119.84s (160.29 ksample/s 17.83 kbase/s) lr=1.51e-03 22.6% chunks filtered
.................................................. 98 0.10183 0.09980 121.37s (158.07 ksample/s 17.59 kbase/s) lr=1.51e-03 22.6% chunks filtered
.................................................. 99 0.10106 0.09903 123.88s (155.04 ksample/s 17.33 kbase/s) lr=1.51e-03 22.6% chunks filtered
.................................................C 100 0.10287 0.09871 116.61s (164.78 ksample/s 18.32 kbase/s) lr=1.51e-03 22.6% chunks filtered
.................................................. 101 0.09959 0.09830 115.93s (165.56 ksample/s 18.41 kbase/s) lr=1.51e-03 22.6% chunks filtered
.................................................. 102 0.10232 0.10040 116.30s (165.17 ksample/s 18.37 kbase/s) lr=1.50e-03 22.6% chunks filtered
.................................................. 103 0.10204 0.09922 113.61s (168.76 ksample/s 18.80 kbase/s) lr=1.50e-03 22.6% chunks filtered
.................................................. 104 0.10118 0.10029 115.30s (166.43 ksample/s 18.55 kbase/s) lr=1.50e-03 22.6% chunks filtered
.................................................. 105 0.10316 0.09928 119.12s (161.12 ksample/s 17.96 kbase/s) lr=1.50e-03 22.7% chunks filtered
.................................................. 106 0.10159 0.09964 114.67s (167.49 ksample/s 18.65 kbase/s) lr=1.50e-03 22.6% chunks filtered
.................................................. 107 0.10175 0.09931 115.64s (165.94 ksample/s 18.48 kbase/s) lr=1.49e-03 22.6% chunks filtered
.................................................. 108 0.10043 0.09866 125.75s (152.72 ksample/s 17.02 kbase/s) lr=1.49e-03 22.7% chunks filtered
.................................................. 109 0.10020 0.09867 118.72s (161.77 ksample/s 18.02 kbase/s) lr=1.49e-03 22.7% chunks filtered
.................................................. 110 0.10340 0.09991 116.44s (164.87 ksample/s 18.40 kbase/s) lr=1.49e-03 22.7% chunks filtered
.................................................. 111 0.10187 0.09888 120.00s (160.10 ksample/s 17.83 kbase/s) lr=1.49e-03 22.7% chunks filtered
.................................................. 112 0.10131 0.09886 116.21s (165.18 ksample/s 18.35 kbase/s) lr=1.48e-03 22.7% chunks filtered
.................................................. 113 0.10127 0.09878 119.95s (159.88 ksample/s 17.79 kbase/s) lr=1.48e-03 22.7% chunks filtered
.................................................. 114 0.10092 0.09867 116.38s (164.89 ksample/s 18.39 kbase/s) lr=1.48e-03 22.7% chunks filtered
.................................................. 115 0.10235 0.09870 117.45s (163.42 ksample/s 18.19 kbase/s) lr=1.48e-03 22.7% chunks filtered
.................................................. 116 0.10141 0.09837 118.39s (162.21 ksample/s 18.07 kbase/s) lr=1.47e-03 22.7% chunks filtered
.................................................. 117 0.10130 0.09781 121.45s (158.15 ksample/s 17.60 kbase/s) lr=1.47e-03 22.7% chunks filtered
.................................................. 118 0.09917 0.09845 121.79s (157.63 ksample/s 17.54 kbase/s) lr=1.47e-03 22.7% chunks filtered
.................................................. 119 0.10108 0.09849 118.65s (161.84 ksample/s 17.97 kbase/s) lr=1.47e-03 22.7% chunks filtered
.................................................C 120 0.09964 0.09825 121.11s (158.32 ksample/s 17.58 kbase/s) lr=1.47e-03 22.7% chunks filtered
.................................................. 121 0.10007 0.09785 115.92s (165.46 ksample/s 18.38 kbase/s) lr=1.46e-03 22.7% chunks filtered
.................................................. 122 0.09985 0.09845 114.33s (168.07 ksample/s 18.72 kbase/s) lr=1.46e-03 22.7% chunks filtered
.................................................. 123 0.09904 0.09813 120.89s (158.90 ksample/s 17.63 kbase/s) lr=1.46e-03 22.7% chunks filtered
.................................................. 124 0.10266 0.09908 117.09s (163.89 ksample/s 18.23 kbase/s) lr=1.46e-03 22.7% chunks filtered
.................................................. 125 0.10281 0.09849 118.34s (162.25 ksample/s 18.06 kbase/s) lr=1.45e-03 22.7% chunks filtered
.................................................. 126 0.10141 0.09834 116.42s (164.98 ksample/s 18.35 kbase/s) lr=1.45e-03 22.7% chunks filtered
.................................................. 127 0.09954 0.09840 117.82s (162.89 ksample/s 18.10 kbase/s) lr=1.45e-03 22.7% chunks filtered
.................................................. 128 0.09923 0.09771 115.27s (166.61 ksample/s 18.57 kbase/s) lr=1.45e-03 22.7% chunks filtered
.................................................. 129 0.10268 0.09790 110.59s (173.61 ksample/s 19.28 kbase/s) lr=1.45e-03 22.7% chunks filtered
.................................................. 130 0.09932 0.09760 115.02s (166.87 ksample/s 18.58 kbase/s) lr=1.44e-03 22.7% chunks filtered
.................................................. 131 0.10253 0.09847 113.70s (168.84 ksample/s 18.83 kbase/s) lr=1.44e-03 22.7% chunks filtered
.................................................. 132 0.09926 0.09810 114.94s (166.99 ksample/s 18.55 kbase/s) lr=1.44e-03 22.7% chunks filtered
.................................................. 133 0.10006 0.09842 120.04s (160.03 ksample/s 17.81 kbase/s) lr=1.44e-03 22.7% chunks filtered
.................................................. 134 0.10196 0.09928 115.63s (166.08 ksample/s 18.52 kbase/s) lr=1.43e-03 22.7% chunks filtered
.................................................. 135 0.09793 0.09739 119.16s (161.16 ksample/s 17.89 kbase/s) lr=1.43e-03 22.7% chunks filtered
.................................................. 136 0.10120 0.09817 113.37s (169.33 ksample/s 18.89 kbase/s) lr=1.43e-03 22.7% chunks filtered
.................................................. 137 0.10041 0.09765 110.80s (173.22 ksample/s 19.23 kbase/s) lr=1.43e-03 22.7% chunks filtered
......Traceback (most recent call last):
File "/opt/kgapps/taiyaki/bin/train_flipflop.py", line 4, in <module>
__import__('pkg_resources').run_script('taiyaki==5.1.0', 'train_flipflop.py')
File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1446, in run_script
exec(code, namespace, namespace)
File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/taiyaki-5.1.0-py3.7-linux-x86_64.egg/EGG-INFO/scripts/train_flipflop.py", line 624, in <module>
main()
File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/taiyaki-5.1.0-py3.7-linux-x86_64.egg/EGG-INFO/scripts/train_flipflop.py", line 541, in main
mod_factor_t, calc_grads = True )
File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/taiyaki-5.1.0-py3.7-linux-x86_64.egg/EGG-INFO/scripts/train_flipflop.py", line 247, in calculate_loss
outputs, seqs, seqlens, sharpen)
File "taiyaki/ctc/ctc.pyx", line 88, in taiyaki.ctc.ctc.FlipFlopCRF.forward
File "taiyaki/ctc/ctc.pyx", line 62, in taiyaki.ctc.ctc.crf_flipflop_grad
AssertionError: Input not finite
Do you have any idea what is going on here, and what we are doing wrong?
This is an area of active research internally. Currently the best solution/workaround is to decrease the --max_lr
and increase the --niteration
s (and maybe --lr_cosine_iters
).
I just generated the same exception using the example training set:
https://s3-eu-west-1.amazonaws.com/ont-research/taiyaki_walkthrough.tar.gz
while following the taiyaki walk through instructions:
https://github.com/nanoporetech/taiyaki/blob/master/docs/walkthrough.rst
though it took 556 iterations before it failed.
FWIW.