sparkflow icon indicating copy to clipboard operation
sparkflow copied to clipboard

Unpickling error in "get_server_weights()" in HogwildSparkModel.py line 33

Open jiseungshin opened this issue 4 years ago • 2 comments

Hello, thank you very much for this beautiful project. I try to test mnist example on our cluster but so far without any success :( The first iteration of training runs without any problems, but in the weight updating step it crashes with the error:

weights = get_server_weights(master_url) File "/grid/3/hadoop/yarn/local/usercache/user/appcache/application_blabla/container_blabla/virtualenv_application_blabla/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py", line 33, in get_server_weights weights = pickle.loads(r.content) _pickle.UnpicklingError: invalid load key, '<'.

I thought, that maybe the weights are not corretly pickled before sending to master, so I checked in the source code, but in my point of view everything seems to be correct.. So I am wondering, why this fails :( I Would really appreciate for further insights.

Thank you in advance!! :)

jiseungshin avatar Aug 13 '19 10:08 jiseungshin

Interesting. I am in the process of moving, so I might be a little slow on getting to this. But I will take a look when I get a chance.

dmmiller612 avatar Aug 16 '19 13:08 dmmiller612

Thank you for the answer :) I have been experimenting a lot, and we fixed the incidence. I think the framework itself is cool, the problem was with our proxy.. We need to set the proxy, but it was confused with connection for the weights transfer between driver /executors, so a lot of environment parameters have been set, now I think it runs.. I will keep you updated!

jiseungshin avatar Aug 16 '19 15:08 jiseungshin