AI-Feynman
AI-Feynman copied to clipboard
Problem with using pretrained NN
Hi Silviu:
Thanks for this great package. I have been using it for my research and has a problem with using pretrained NNs, The error is as follows:
RuntimeError: Error(s) in loading state_dict for SimpleNet: size mismatch for linear1.weight: copying a param with shape torch.Size([128, 4]) from checkpoint, the shape in current model is torch.Size([128, 3]).
Noted that my dataset has 4 independent variables. From the log, it seems AI-Feyman has reduced the number of variables from 4 to 3 in the previous steps, may cause this issue. Just want to check what is the best way to fix this.
Thanks
Su
Dear Su,
Thank you for your interest in our code. Could you please show me the output of the code a while before this error message, it might help me understand easier what is going on.
Hi,
I have been also having the same issue with my own data: RuntimeError: Error(s) in loading state_dict for SimpleNet: size mismatch for linear1.weight: copying a param with shape torch.Size([128, 9]) from checkpoint, the shape in current model is torch.Size([128, 8]). I have 10 independent variables in my case.
This is my entire error message:
'' NN already trained
NN loss: tensor(0.4381, grad_fn=<DivBackward0>)
Checking for symmetry Data_Values.txt_train-translated_divide NN already trained
NN loss: tensor(nan, grad_fn=<DivBackward0>)
Checking for symmetry Data_Values.txt_train-translated_divide-translated_plus Found pretrained NN
RuntimeError Traceback (most recent call last)
/content/AI-Feynman/Code/S_run_aifeynman.py in run_AI_all(pathdir, filename, BF_try_time, BF_ops_file_type, polyfit_deg, NN_epochs, PA) 110 new_pathdir, new_filename = do_translational_symmetry_divide(pathdir,filename,symmetry_divide_result[1],symmetry_divide_result[2]) 111 PA1_ = ParetoSet() --> 112 PA1 = run_AI_all(new_pathdir,new_filename,BF_try_time,BF_ops_file_type, polyfit_deg, NN_epochs, PA1_) 113 PA = add_sym_on_pareto(pathdir,filename,PA1,symmetry_divide_result[1],symmetry_divide_result[2],PA,"/") 114 return PA
/content/AI-Feynman/Code/S_run_aifeynman.py in run_AI_all(pathdir, filename, BF_try_time, BF_ops_file_type, polyfit_deg, NN_epochs, PA) 89 new_pathdir, new_filename = do_translational_symmetry_plus(pathdir,filename,symmetry_plus_result[1],symmetry_plus_result[2]) 90 PA1_ = ParetoSet() ---> 91 PA1 = run_AI_all(new_pathdir,new_filename,BF_try_time,BF_ops_file_type, polyfit_deg, NN_epochs, PA1_) 92 PA = add_sym_on_pareto(pathdir,filename,PA1,symmetry_plus_result[1],symmetry_plus_result[2],PA,"+") 93 return PA
/content/AI-Feynman/Code/S_run_aifeynman.py in run_AI_all(pathdir, filename, BF_try_time, BF_ops_file_type, polyfit_deg, NN_epochs, PA) 61 elif path.exists("results/NN_trained_models/models/" + filename + "_pretrained.h5"): 62 print("Found pretrained NN \n") ---> 63 NN_train(pathdir,filename,NN_epochs/2,lrs=1e-3,N_red_lr=3,pretrained_path="results/NN_trained_models/models/" + filename + "_pretrained.h5") 64 print("NN loss after training: ", NN_eval(pathdir,filename), "\n") 65 else:
/content/AI-Feynman/Code/S_NN_train.py in NN_train(pathdir, filename, epochs, lrs, N_red_lr, pretrained_path) 114 115 if pretrained_path!="": --> 116 model_feynman.load_state_dict(torch.load(pretrained_path)) 117 118 check_es_loss = 10000
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict) 845 if len(error_msgs) > 0: 846 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( --> 847 self.class.name, "\n\t".join(error_msgs))) 848 return _IncompatibleKeys(missing_keys, unexpected_keys) 849
RuntimeError: Error(s) in loading state_dict for SimpleNet: size mismatch for linear1.weight: copying a param with shape torch.Size([128, 9]) from checkpoint, the shape in current model is torch.Size([128, 8]). ''
Hi Silviu:
My problem is similar to Matthew:
....
Checking for brute force +
Trying to solve mysteries with brute force...
Trying to solve results/mystery_world_sqrt/eoq_data.txt_train-translated_plus...
/bin/cp -p results/mystery_world_sqrt/eoq_data.txt_train-translated_plus mystery.dat
Number of variables..... 3
Functions used.......... +-/><~\RPSCLE
Arity 0 : Pabc
Arity 1 : ><~\RSCLE
Arity 2 : +-/
Loading mystery data....
1000 rows read from file mystery.dat
Number of examples...... 1000
Mystery data has largest magnitude 96.744442555828869 at j= 258
Searching for best fit...
666.000000000000 93.602849814816 P 1 763.2601 27.0269 60.2462 1878.5495
666.000000000000 93.744442555829 c 4 733.8841 28.9702 62.1894 1878.7589
666.000000000000 36.329212687856 aR 22 536.4852 31.0740 64.2933 1866.9244
666.000000000000 33.498889352461 bR 23 495.7903 31.0285 64.2478 1863.3896
666.000000000000 33.506795540779 b<R 255 495.7848 34.4993 67.7186 1866.8599
666.000000000000 30.498889352461 cbR+ 504 465.9682 35.3932 68.6124 1865.0132
666.000000000000 72.136995216099 caL* 692 445.3115 35.7402 68.9595 1863.4018
666.000000000000 71.862293635523 cbL* 696 438.0313 35.7342 68.9534 1862.6581
666.000000000000 7.301723455837 bb+R 1271 397.2188 36.5494 69.7687 1859.0650
666.000000000000 -10.338765766932 aPR 1278 382.1004 36.5212 69.7405 1857.3026
666.000000000000 -15.355383331861 bPR 1279 325.5215 36.3115 69.5308 1849.9926
636.385378826349 -31.965972578496 bP>R 13731 286.8809 39.6685 72.8878 1847.6521
633.635321990147 33.420396146597 acLR 13854 324.9232 39.6752 72.8944 1853.3458
546.948122255043 30.453801024219 bcLR 13855 263.2369 39.4630 72.6823 1843.7409
511.293663817382 -58.281956708123 Pba+R 87277 190.8319 42.0210 75.2402 1831.7213
509.110778573857 62.003271579413 bRcRL 355475 270.4776 44.0409 77.2601 1849.6602
498.535075165671 27.453801024219 cbcLR+ 969012 240.0173 45.4573 78.6766 1845.6561
441.063158810445 7.053589317205 abcL*+R 1079602 198.5730 45.4365 78.6558 1837.1642
217.569639622967 15.392387614913 cba**RR 1364564 121.4374 44.7550 77.9743 1815.0671
217.565900310493 15.397960246034 cba<**RR 16990232 121.4347 48.3932 81.6124 1818.7043
217.565881427321 15.397472595089 cab<RR 16990244 121.4346 48.3932 81.6124 1818.7043
217.563899354636 15.398533973346 cbaR<R 21129688 121.4324 48.7077 81.9270 1819.0180
Checking for brute force *
Trying to solve mysteries with brute force...
Trying to solve results/mystery_world_sqrt/eoq_data.txt_train-translated_plus...
/bin/cp -p results/mystery_world_sqrt/eoq_data.txt_train-translated_plus mystery.dat
Number of variables..... 3
Functions used.......... +-/><~\RPSCLE
Arity 0 : Pabc
Arity 1 : ><~\RSCLE
Arity 2 : +-/
Loading mystery data....
1000 rows read from file mystery.dat
Number of examples...... 1000
Mystery data has largest magnitude 127750.00000000000 at j= 1
Searching for best fit...
666.000000000000 100.039859017533 P 1 566.0325 26.7770 59.9963 1864.9109
666.000000000000 0.002609252760 a 2 609.7040 27.6532 60.8724 1869.3016
666.000000000000 17.460249716745 c 4 382.9163 28.2411 61.4604 1849.0802
666.000000000000 16.541289205337 c> 8 385.6423 29.2260 62.4452 1850.4039
666.000000000000 14.865696201390 cP+ 44 393.1367 31.6787 64.8980 1853.7414
527.806625474009 0.002561405826 ba+ 47 302.8501 31.2081 64.4274 1841.9329
527.804082835278 0.002561426702 ba<+ 451 302.8485 34.4705 67.6898 1845.1950
527.744822619888 0.002561030125 cba++ 5100 302.8294 37.9696 71.1889 1848.6914
527.286122070359 2.449489742783 baRR 20071 174.6039 39.9449 73.1642 1825.5463
527.107880112961 0.002560416004 babR++ 64763 302.3497 41.6345 74.8538 1852.2857
496.799142612467 0.002554625922 abPL+ 65502 272.1302 41.5654 74.7847 1847.4979
420.325290327500 0.000603729143 ba+cR* 254043 182.1380 43.2797 76.4990 1831.1354
403.349065384127 2.148129696752 cbaRR+ 1151316 163.9800 45.4004 78.6197 1828.5243
329.359877029521 0.063834549508 acbR*R 1264230 121.0946 45.2430 78.4623 1814.8279
0.000000000000 1.189207115003 cba**RR 1364564 0.0000 20.3800 28.2089 220.9229
All done: results in results.dat
Checking polyfit
Complexity RMSE Expression [0.0, 33.72515523232843, -1.50374224914926e-759802] [15.509775004326936, 29.04603252368458, 'asin(0.000000000012*(x1exp(exp(exp(sin(log(x1)))))))'] [18.509775004326936, 29.044458005885136, 'asin(0.000000000091(exp((cos((x1+1)))(-1)))(-1))'] [21.094737505048094, 1.7551041143752824, '0.000000000000+sqrt((x2*(x1*(x0+x0))))'] Checking for brute force +
Trying to solve mysteries with brute force...
Trying to solve results/mystery_world_squared/eoq_data.txt_train-translated_plus...
/bin/cp -p results/mystery_world_squared/eoq_data.txt_train-translated_plus mystery.dat
Number of variables..... 3
Functions used.......... +-/><~\RPSCLE
Arity 0 : Pabc
Arity 1 : ><~\RSCLE
Arity 2 : +-/
Loading mystery data....
1000 rows read from file mystery.dat
Number of examples...... 1000
Mystery data has largest magnitude 127750.00000000000 at j= 1
Searching for best fit...
0.000976562500 0.000001907349 cbaa+** 864596 0.0001 26.3313 59.5506 1193.2193
Checking for brute force *
Trying to solve mysteries with brute force...
Trying to solve results/mystery_world_squared/eoq_data.txt_train-translated_plus...
/bin/cp -p results/mystery_world_squared/eoq_data.txt_train-translated_plus mystery.dat
Number of variables..... 3
Functions used.......... +-/><~\RPSCLE
Arity 0 : Pabc
Arity 1 : ><~\RSCLE
Arity 2 : +-/
Loading mystery data....
1000 rows read from file mystery.dat
Number of examples...... 1000
Mystery data has largest magnitude 4596430400000.0000 at j= 660
Searching for best fit...
0.000976562500 2.000000000000 cba** 5420 0.0001 19.0137 52.2330 1185.9016
Checking polyfit
Complexity RMSE Expression [0.0, 33.72515523232843, -1.50374224914926e-759802] [15.509775004326936, 29.04603252368458, 'asin(0.000000000012*(x1exp(exp(exp(sin(log(x1)))))))'] [18.509775004326936, 29.044458005885136, 'asin(0.000000000091(exp((cos((x1+1)))(-1)))(-1))'] [21.094737505048094, 1.7551041143752824, '0.000000000000+sqrt((x2*(x1*(x0+x0))))'] Checking for brute force +
Trying to solve mysteries with brute force...
Trying to solve results/mystery_world_tan/eoq_data.txt_train-translated_plus...
/bin/cp -p results/mystery_world_tan/eoq_data.txt_train-translated_plus mystery.dat
Number of variables..... 3
Functions used.......... +-/><~\RPSCLE
Arity 0 : Pabc
Arity 1 : ><~\RSCLE
Arity 2 : +-/
Loading mystery data....
1000 rows read from file mystery.dat
Number of examples...... 1000
Mystery data has largest magnitude 1.5030748346316705E-003 at j= 942
Searching for best fit...
666.000000000000 -3.143095815847 P 1 166.1562 28.9500 62.1693 1808.9909
666.000000000000 37.998496925165 c~ 16 166.7884 32.9472 66.1665 1813.1642
666.000000000000 -489.388875204521 bR 23 252.7203 33.4397 66.6590 1832.6463
666.000000000000 -489.387853517918 b<R 255 252.7214 36.9105 70.1298 1836.1173
666.000000000000 666.000000000000 cc~* 620 1113.5803 37.9645 71.1838 1905.0587
666.000000000000 666.000000000000 cc~<* 7584 1130.9781 41.5738 74.7931 1909.3786
666.000000000000 -666.000000000000 bP>E/ 11139 2138.6914 41.9623 75.1816 1938.9995
666.000000000000 -427.957730731187 cccS** 67732 1197.7314 44.4707 77.6900 1915.1537
666.000000000000 -1.021180233438 ca+RS\ 296860 51.8851 44.7189 77.9381 1774.0710
Checking for brute force *
Trying to solve mysteries with brute force...
Trying to solve results/mystery_world_tan/eoq_data.txt_train-translated_plus...
/bin/cp -p results/mystery_world_tan/eoq_data.txt_train-translated_plus mystery.dat
Number of variables..... 3
Functions used.......... +-/><~\RPSCLE
Arity 0 : Pabc
Arity 1 : ><~\RSCLE
Arity 2 : +-/
Loading mystery data....
1000 rows read from file mystery.dat
Number of examples...... 1000
Mystery data has largest magnitude 127750.00000000000 at j= 1
Searching for best fit...
666.000000000000 -0.022601610502 P 1 166.1535 28.9500 62.1692 1808.9902
666.000000000000 -0.000000589498 a 2 166.1523 29.9499 63.1692 1809.9899
666.000000000000 -0.000031557802 b 3 166.0455 30.5337 63.7530 1810.5455
666.000000000000 -0.000031571834 b< 11 166.0455 32.4082 65.6275 1812.4200
666.000000000000 -0.000000000262 ba* 63 165.9667 34.9254 68.1447 1814.9162
666.000000000000 -0.000000014026 bb* 67 425.0213 34.9417 68.1610 1857.9057
666.000000000000 -0.000000014032 bb<* 599 425.1815 38.1020 71.3213 1861.0833
666.000000000000 -0.000000000000 bba** 5419 388.2227 41.2396 74.4589 1860.1119
666.000000000000 -0.000000000779 cbb** 5436 790.8292 41.1509 74.3702 1892.5766
666.000000000000 666.000000000000 bS>S< 42783 748.3024 43.9908 77.2101 1893.0313
666.000000000000 666.000000000000 bb>*C< 212655 1174.2245 46.0924 79.3117 1915.9000
Checking polyfit
Complexity RMSE Expression [0.0, 30.05749571799299, 'atan(-0.000000000000*(x1*(x1x0)))'] [15.509775004326936, 29.04603252368458, 'asin(0.000000000012(x1exp(exp(exp(sin(log(x1)))))))'] [18.509775004326936, 29.044458005885136, 'asin(0.000000000091(exp((cos((x1+1)))(-1)))(-1))'] [21.094737505048094, 1.7551041143752824, '0.000000000000+sqrt((x2*(x1*(x0+x0))))'] Checking for symmetry eoq_data.txt_train-translated_plus Found pretrained NN
Traceback (most recent call last):
File "inventory/inventory_learn.py", line 70, in
Process finished with exit code 1
Hey Silviu, facing the same exact problem when running on my own data.
File "birdseyefeyn.py", line 6, in <module>
run_aifeynman("example_data/",'debuglight3_hitincremented_only_small.txt',30,"14ops.txt", polyfit_deg=3, NN_epochs=400)
File "/Volumes/Transcend/ai_feynman/AI-Feynman/feynman/S_run_aifeynman.py", line 169, in run_aifeynman
PA = run_AI_all(pathdir,filename+"_train",BF_try_time,BF_ops_file_type, polyfit_deg, NN_epochs, PA=PA)
File "/Volumes/Transcend/ai_feynman/AI-Feynman/feynman/S_run_aifeynman.py", line 94, in run_AI_all
PA1 = run_AI_all(new_pathdir,new_filename,BF_try_time,BF_ops_file_type, polyfit_deg, NN_epochs, PA1_)
File "/Volumes/Transcend/ai_feynman/AI-Feynman/feynman/S_run_aifeynman.py", line 66, in run_AI_all
NN_train(pathdir,filename,NN_epochs/2,lrs=1e-3,N_red_lr=3,pretrained_path="results/NN_trained_models/models/" + filename + "_pretrained.h5")
File "/Volumes/Transcend/ai_feynman/AI-Feynman/feynman/S_NN_train.py", line 133, in NN_train
model_feynman.load_state_dict(torch.load(pretrained_path))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for SimpleNet:
size mismatch for linear1.weight: copying a param with shape torch.Size([128, 4]) from checkpoint, the shape in current model is torch.Size([128, 3]).
The variables are exactly the same number of variables in example1 which is what is very confusing to me.
I also tried editing linear1.weight to subtract 1 to get 3 (instead of 4) and discovered that it's actually the same shape which makes this error even more confusing.
example1.txt
works fine, it's only when I use my own dataset with the same number of variables that it gives that error.
Just to be even clearer, here's a sample of what my dataset looks like:
6382179.0000000000000000 1.0000000000000000 1000.0000000000000000 1.0000000000000000 26.0000000000000000
6382179.0000000000000000 1.0000000000000000 1000.0000000000000000 2.0000000000000000 30.0000000000000000
6382179.0000000000000000 1.0000000000000000 1000.0000000000000000 3.0000000000000000 46.0000000000000000
6382179.0000000000000000 1.0000000000000000 1000.0000000000000000 4.0000000000000000 77.0000000000000000
Even though these are supposed to be integers, I tried hard to match everything as much as possible to example1.
The last thing I'm thinking of trying is to normalize all these numbers to maybe something under 10 which is not something I want to do but it might be necessary to actually get an answer.
I really tried hard to find this bug and I still can't find it.
- I checked
model_feynman
before thestate_dict
is loaded, andlinear1.weight
is set to3
, and when I checked the model being loaded it's also set to3
and yet I still get this same error.
RuntimeError: Error(s) in loading state_dict for SimpleNet:
size mismatch for linear1.weight: copying a param with shape torch.Size([128, 4]) from checkpoint, the shape in current model is torch.Size([128, 3]).
- I tried subtracting and adding 1 to the weight, still the same error.
model_feynman.linear1.in_features= (model_feynman.linear1.in_features - 1)
-
I tried loading in the model using
torch.load
but of course that didn't work because it loads an OrderedDict. -
I tried doing:
model_feynman = SimpleNet(n_variables)
again before loading the model, and still the same error.
And somehow example1.txt
still works.
- I tried renaming my file to
example1
...still not working. I thought maybe some default settings were set for that filename.
If anybody solves this problem, please help!
In the first variable, I'm trying to represent a string abc
numerically by converting from string to binary to int. Not sure if that's the best way to do it for a problem like this but anyway I'm experimenting.
I get the same error with the current branch:
Error(s) in loading state_dict for SimpleNet:
size mismatch for linear1.weight: copying a param with shape torch.Size([128, 3]) from checkpoint, the shape in current model is torch.Size([128, 2]).
This branch does not get the error running on Google Colab with the TPU:
!git clone https://github.com/SJ001/AI-Feynman.git
!cd /content/AI-Feynman && git reset --hard 28edde1a36a166a081de84999ab4fd40071957db