gps
gps copied to clipboard
Crash in the second iteration
Hi Finn, Thank you for your excellent work and it is really an excited innovation. And all the demos can work well except the last one. When running "python python/gps/gps_main.py pr2_badmm_example"
it reports errors like this:
I0430 02:03:56.217406 978 solver.cpp:408] Test net output #5: InnerProduct3 = 0
I0430 02:03:56.217413 978 solver.cpp:408] Test net output #6: InnerProduct3 = 0
Exception in thread Thread-13:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "python/gps/gps_main.py", line 366, in <lambda>
target=lambda: gps.run(itr_load=resume_training_itr)
File "python/gps/gps_main.py", line 69, in run
self._log_data(itr, traj_sample_lists, pol_sample_lists)
File "python/gps/gps_main.py", line 240, in _log_data
copy.copy(self.algorithm)
File "python/gps/utility/data_logger.py", line 25, in pickle
pickle.dump(data, open(filename, 'wb'))
File "/usr/lib/python2.7/copy_reg.py", line 84, in _reduce_ex
dict = getstate()
File "python/gps/algorithm/policy_opt/policy_opt_caffe.py", line 233, in __getstate__
self.solver.snapshot()
AttributeError: 'AdamSolver' object has no attribute 'snapshot'
and also when I run "python python/gps/gps_main.py pr2_example " it reports the following errors sometimes
LinAlgError: 2-th leading minor not positive definite ...
raise LinAlgError("%d-th leading minor not positive definite" % info) LinAlgError: 2-th leading minor not positive definite
Do you have any idea about these two problems? Looking forward to your answers. Thank you very much.
Regarding the first error, make sure you have the latest version of caffe (i.e. this line of code should exist)
Regarding the second error, can you be more specific? How often and when does it appear? I may have time to look into it this weekend.
Thank you so much. I have tried the newest caffe but there are some errors. I will figure out it . As for the second problem, I find it occur when I run "python python/gps/gps_main.py pr2_badmm_example" at the first try and the problem disappears in the following callings. it looks like this:
I0501 12:58:00.010692 21648 net.cpp:228] DummyData1 does not need backward computation.
I0501 12:58:00.010694 21648 net.cpp:270] This network produces output InnerProduct3
I0501 12:58:00.010700 21648 net.cpp:283] Network initialization done.
I0501 12:58:00.010727 21648 solver.cpp:59] Solver scaffolding done.
Exception in thread Thread-13:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "python/gps/gps_main.py", line 366, in <lambda>
target=lambda: gps.run(itr_load=resume_training_itr)
File "python/gps/gps_main.py", line 67, in run
self._take_iteration(itr, traj_sample_lists)
File "python/gps/gps_main.py", line 195, in _take_iteration
self.algorithm.iteration(sample_lists)
File "python/gps/algorithm/algorithm_badmm.py", line 48, in iteration
self._update_dynamics() # Update dynamics model using all sample.
File "python/gps/algorithm/algorithm.py", line 84, in _update_dynamics
self.cur[cond].traj_info.dynamics.update_prior(cur_data)
File "python/gps/algorithm/dynamics/dynamics_lr_prior.py", line 21, in update_prior
self.prior.update(X, U)
File "python/gps/algorithm/dynamics/dynamics_prior_gmm.py", line 98, in update
self.gmm.update(xux, K)
File "python/gps/utility/gmm.py", line 174, in update
logobs = self.estep(data)
File "python/gps/utility/gmm.py", line 75, in estep
check_finite=False)
File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_cholesky.py", line 81, in cholesky
check_finite=check_finite)
File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_cholesky.py", line 30, in _cholesky
raise LinAlgError("%d-th leading minor not positive definite" % info)
LinAlgError: 27-th leading minor not positive definite
Hi Chelsea, I get the same errors: "File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_cholesky.py", line 30, in _cholesky raise LinAlgError("%d-th leading minor not positive definite" % info) LinAlgError: 27-th leading minor not positive definite"
when running the pr2_example_badmm experiment.
I think there is an bug somewhere in the pr2 controller that causes this error on the very first experiment that is run, after launching the pr2 plugin. For example, this could be caused by the sample data at the first time step to be uninitialized.
After the first run, I don't think that the error will come up. Let me know if this isn't the case for you.
I currently don't have time to investigate the issue personally, but I will post any updates that I hear on this thread.
I get a related error when I run pr2_badmm_example
. It gives
Exception in thread Thread-8:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "python/gps/gps_main.py", line 398, in <lambda>
target=lambda: gps.run(itr_load=resume_training_itr)
File "python/gps/gps_main.py", line 66, in run
self._take_sample(itr, cond, i)
File "python/gps/gps_main.py", line 184, in _take_sample
verbose=(i < self._hyperparams['verbose_trials'])
File "python/gps/agent/ros/agent_ros.py", line 156, in sample
self.reset(condition)
File "python/gps/agent/ros/agent_ros.py", line 135, in reset
condition_data[TRIAL_ARM]['data'])
File "python/gps/agent/ros/agent_ros.py", line 124, in reset_arm
self._reset_service.publish_and_wait(reset_command, timeout=timeout)
File "python/gps/agent/ros/ros_utils.py", line 146, in publish_and_wait
raise TimeoutException(time_waited)
TimeoutException: ('Timed out after %f seconds', 20.000000000000327)
Yes, this issue is because of the controller and is not algorithm specific.
After the first run, I don't think that the error will come up. Let me know if this isn't the case for you.
I am also having similar errors on the first experiment.
File "python/gps/utility/gmm.py", line 63, in estep L = scipy.linalg.cholesky(sigma, lower=True) File "/home/ermanoarruda/.virtualenvs/robotics/local/lib/python2.7/site-packages/scipy/linalg/decomp_cholesky.py", line 81, in cholesky check_finite=check_finite) File "/home/ermanoarruda/.virtualenvs/robotics/local/lib/python2.7/site-packages/scipy/linalg/decomp_cholesky.py", line 20, in _cholesky a1 = asarray_chkfinite(a) File "/home/ermanoarruda/.virtualenvs/robotics/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 1033, in asarray_chkfinite "array must not contain infs or NaNs") ValueError: array must not contain infs or NaNs
However, it works fine after running pr2_example_badmm experiment for the second time (and onwards). The problem seems indeed to be related with initialisation of the first sample.
At some point I was also getting the timeout @lakehanne referred to, but that was because I had not built gps_agent_pkg with the additional caffe flags required for running the pr2_example_badmm.