AI-Feynman
AI-Feynman copied to clipboard
Runtime error: size mismatch for linear1.weight: copying a param with shape
I have a dataset that runs fine for about an hour of steady processing with apparent improvement before giving this error. See the attached output log. It seems to have something to do with the dimension changing, despite there being no issues with the dimensions in the dataset. Console output: error.txt
I have reduced the variable count for subsequent test runs and still encounter exactly the same error (size mismatch for linear.weight ... ) with the network variable size eventually hitting a mismatch. In these cases the cases failed after almost 12 hours of computation.
Hello,
I'm having the same issue. The developers haven't responded in two weeks, perhaps it is time we bring this issue to our own hands. Meanwhile we wait for this bug to be fixed, we could try to fix this issue on our own, I can make an email thread where we can discuss AI-Feynman and accelerate the time in which this issue can be fixed. You can email me at [email protected] or you may also send your email so I can start the discussion thread of 4 people with this same issue. Thanks.
-Rodrigo
@RodrigoSandon I sent you an email.
Hey guys I'm getting this same error here:
Checking polyfit
Complexity RMSE Expression
[0.0, 28.98901941844433, 'asin(0)']
Checking for symmetry
Dataset with clusters n.txt_train-translated_plus
Found pretrained NN
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-6-7c9e9ea3efd3> in <module>()
1 from S_run_aifeynman import run_aifeynman
2 # Run example 1 as the regression dataset
----> 3 run_aifeynman("/content/AI-Feynman/example_data/",'Dataset with clusters n.txt',30,"14ops.txt", polyfit_deg=3, NN_epochs=400)
4 frames
/content/AI-Feynman/Code/S_run_aifeynman.py in run_aifeynman(pathdir, filename, BF_try_time, BF_ops_file_type, polyfit_deg, NN_epochs, vars_name, test_percentage)
163 PA = ParetoSet()
164 # Run the code on the train data
--> 165 PA = run_AI_all(pathdir,filename+"_train",BF_try_time,BF_ops_file_type, polyfit_deg, NN_epochs, PA=PA)
166 PA_list = PA.get_pareto_points()
167
/content/AI-Feynman/Code/S_run_aifeynman.py in run_AI_all(pathdir, filename, BF_try_time, BF_ops_file_type, polyfit_deg, NN_epochs, PA)
89 new_pathdir, new_filename = do_translational_symmetry_plus(pathdir,filename,symmetry_plus_result[1],symmetry_plus_result[2])
90 PA1_ = ParetoSet()
---> 91 PA1 = run_AI_all(new_pathdir,new_filename,BF_try_time,BF_ops_file_type, polyfit_deg, NN_epochs, PA1_)
92 PA = add_sym_on_pareto(pathdir,filename,PA1,symmetry_plus_result[1],symmetry_plus_result[2],PA,"+")
93 return PA
/content/AI-Feynman/Code/S_run_aifeynman.py in run_AI_all(pathdir, filename, BF_try_time, BF_ops_file_type, polyfit_deg, NN_epochs, PA)
61 elif path.exists("results/NN_trained_models/models/" + filename + "_pretrained.h5"):
62 print("Found pretrained NN \n")
---> 63 NN_train(pathdir,filename,NN_epochs/2,lrs=1e-3,N_red_lr=3,pretrained_path="results/NN_trained_models/models/" + filename + "_pretrained.h5")
64 print("NN loss after training: ", NN_eval(pathdir,filename), "\n")
65 else:
/content/AI-Feynman/Code/S_NN_train.py in NN_train(pathdir, filename, epochs, lrs, N_red_lr, pretrained_path)
114
115 if pretrained_path!="":
--> 116 model_feynman.load_state_dict(torch.load(pretrained_path))
117
118 check_es_loss = 10000
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
1050 if len(error_msgs) > 0:
1051 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
-> 1052 self.__class__.__name__, "\n\t".join(error_msgs)))
1053 return _IncompatibleKeys(missing_keys, unexpected_keys)
1054
RuntimeError: Error(s) in loading state_dict for SimpleNet:
size mismatch for linear1.weight: copying a param with shape torch.Size([128, 10]) from checkpoint, the shape in current model is torch.Size([128, 9]).
Were you able to solve this?
I'm not loading from a pre-trained model, this just happens after running this notebook on my own data (after substituting 'example1.txt' with my own data:
https://github.com/dcshapiro/AI-Feynman/blob/master/AI_Feynman_2_0.ipynb
I just encountered this error too. Looks like it's been a while since it's remained unaddressed. Have you guys made any progress? Thanks.
Hey guys, for anyone who's interested in this problem being debugged for $. Would you pay to have this solved? If so I could try my hand at it. Doesn't have to be big, e.g. $1 to $5 through paypal. If so, contact me at [email protected] or just let me know here
There appears to be an issue where the new training data file generated in S_symmetry.py has one fewer variable then the saved model which gets loaded in S_NN_train.py when 'pretrained_path' code is executed. Synopsis : i. if pretrained_path!="": in S_NN_train.py the exception is generated loading the state_dict and running model.eval() ii. remove_input_neuron(model,n_variables,j,ct_median,"results/NN_trained_models/models/"+filename + "-translated_plus_pretrained.h5") in S_symmetry.py saves the state_dict and model with the original number of factors. iii. S_NN_train.py loads n_variables with the reduced number of factors. iv. elif path.exists("results/NN_trained_models/models/" + filename + "_pretrained.h5"): in S_run_aifeynman.py loads the pretrained model with the original number of factors.
If you comment out this line in S_run_aifeynman.py, you can circumvent the exception.
elif path.exists("results/NN_trained_models/models/" + filename + "_pretrained.h5"):
When S_symmetry invokes 'remove_input_neuron(), the model has 'n' factors, however, the new 'data_translated' file saved in S_symmetry.py has 'n-1' factors. The discrepancy between the number of factors in the .h5 file and the data_translated file causes the exception in S_NN_train, when 'if_pretrained_path' code is executed.
Original training file (4 factors):
New training file (3 factors):