sparkflow
sparkflow copied to clipboard
Unpickling error in "get_server_weights()" in HogwildSparkModel.py line 33
Hello, thank you very much for this beautiful project. I try to test mnist example on our cluster but so far without any success :( The first iteration of training runs without any problems, but in the weight updating step it crashes with the error:
weights = get_server_weights(master_url) File "/grid/3/hadoop/yarn/local/usercache/user/appcache/application_blabla/container_blabla/virtualenv_application_blabla/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py", line 33, in get_server_weights weights = pickle.loads(r.content) _pickle.UnpicklingError: invalid load key, '<'.
I thought, that maybe the weights are not corretly pickled before sending to master, so I checked in the source code, but in my point of view everything seems to be correct.. So I am wondering, why this fails :( I Would really appreciate for further insights.
Thank you in advance!! :)
Interesting. I am in the process of moving, so I might be a little slow on getting to this. But I will take a look when I get a chance.
Thank you for the answer :) I have been experimenting a lot, and we fixed the incidence. I think the framework itself is cool, the problem was with our proxy.. We need to set the proxy, but it was confused with connection for the weights transfer between driver /executors, so a lot of environment parameters have been set, now I think it runs.. I will keep you updated!