FILM
FILM copied to clipboard
Stuck at Resetting ThorEnv
HI! I'm dealing with similar issues as @biubiuisacat and @JinyeonKim. I'm stuck at Resetting ThorEnv and I double checked the dependency (pytorch==1.6.0, torchvision==0.7.0, cudatoolkit=10.2) so I don't think that's the reason why the code is not working...
Also I ran the code with a desktop with 2080 ti so hardware probably wouln't cause the problem either.
So I looked up the ai2thor code and I found the code stops working when ~/FILM/alfred_utils/env/thor_env_code.py
calls the function super().step()
(line 278). The function looks like below.
(ai2thor/controller.py, line 615)
def step(self, action, raise_for_failure=False):
if self.headless:
action["renderImage"] = False
# prevent changes to the action from leaking
action = copy.deepcopy(action)
# XXX should be able to get rid of this with some sort of deprecation warning
if 'AI2THOR_VISIBILITY_DISTANCE' in os.environ:
action['visibilityDistance'] = float(os.environ['AI2THOR_VISIBILITY_DISTANCE'])
should_fail = False
self.last_action = action
if ('objectId' in action and (action['action'] == 'OpenObject' or action['action'] == 'CloseObject')):
force_visible = action.get('forceVisible', False)
if not force_visible and self.last_event.instance_detections2D and action['objectId'] not in self.last_event.instance_detections2D:
should_fail = True
obj_metadata = self.last_event.get_object(action['objectId'])
if obj_metadata is None or obj_metadata['isOpen'] == (action['action'] == 'OpenObject'):
should_fail = True
rotation = action.get('rotation')
if rotation is not None and type(rotation) != dict:
action['rotation'] = {}
action['rotation']['y'] = rotation
if should_fail:
new_event = copy.deepcopy(self.last_event)
new_event.metadata['lastActionSuccess'] = False
self.last_event = new_event
return new_event
assert self.request_queue.empty(), 'request_queue is not empty' # continues if request_queue is empty.
self.response_queue.put_nowait(action) #put action. nonblocking queue
# code stops at this point.
self.last_event = queue_get(self.request_queue)
if not self.last_event.metadata['lastActionSuccess'] and self.last_event.metadata['errorCode'] == 'InvalidAction':
raise ValueError(self.last_event.metadata['errorMessage'])
if raise_for_failure:
assert self.last_event.metadata['lastActionSuccess']
return self.last_event
Then I found out the code stops when the function queue_get(self.request_queue)
is called (I marked where it is with annotation). The function has a while loop in it and the program has to break out of the while loop if it gets an item from the request_queue
, but it keeps fails to get an item from the queue because the queue is empty, so the code is just stuck at the while loop.
def queue_get(que:Queue):
res = None
while True:
try:
res = que.get(block=True, timeout=0.5)
print("que.get result: ", res)
break
except Empty:
pass
return res
Could I get some advice of why this happens and how to solve this problem? I'm stuck here for weeks...😭😭
Thanks!
@dada-h-h Exactly same here. Could you try a minimal examples https://allenai.github.io/ai2thor-v2.1.0-documentation/examples ?
You can also try to set
controller = ai2thor.controller.Controller(headless=True)
to see if there is any difference.
Hello, I think if you can't run the reset here, it's likely that you can't run the one in ALFRED either:
https://github.com/askforalfred/alfred/blob/master/env/thor_env.py#L47
If it's a headless computer, it's likely to be a Xserver problem. (The simulator not recognizing Xserver). You should check if ALFRED's scripts/check_thor.py works (https://github.com/askforalfred/alfred/blob/master/scripts/check_thor.py)
Hello, I think if you can't run the reset here, it's likely that you can't run the one in ALFRED either:
https://github.com/askforalfred/alfred/blob/master/env/thor_env.py#L47
If it's a headless computer, it's likely to be a Xserver problem. (The simulator not recognizing Xserver). You should check if ALFRED's scripts/check_thor.py works (https://github.com/askforalfred/alfred/blob/master/scripts/check_thor.py)
Hi @soyeonm, does the code is expected to work on a MacOS machine? I noticed that you also included some macos instructions in readme, but I faced the similar hanging issues here. I cannot even run a minimal example of ai2thor, version 2.1.0.
I probably should raise the issue in alfred repo, by the way.
Hello, thanks for your question. Yes, it ran on my mac; I will check again later today.
@soyeonm @Roadsong Thank you very much for your answers!
It seems like it was a dependency issue. I tried making a new conda environment(python 3.8.5) and installed all the packages referring to the package versions in the docker container, and then it worked!
The specific versions are:
numpy==1.20.2
pandas==1.2.4
opencv-python==4.5.1.48
networkx==2.5.1
h5py==3.2.1
tqdm==4.64.0
vocab==0.0.5
revtok==0.0.3
Pillow==9.0.2
torch==1.6.0
torchvision==0.7.0
tensorboardX==1.8
ai2thor==2.1.0
matplotlib==3.5.1
tensorboard==2.9.1
seaborn==0.9.0
imageio==2.6.0
scikit-fmm==2019.1.30
scikit-image==0.15.0
scikit-learn==0.22.2.post1
ifcfg==0.21
I'm still not sure what exact packages are causing the issue though...
Plus, when I was installing the packages, I used this file which I pip freeze from the docker container. film_docker_requirements.txt
This is what I did: I first installed pytorch,
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch
then ran conda install to download requirements, (it takes some time)
while read requirement; do conda install --yes $requirement; done < film_docker_requirements.txt
then used pip or conda-forge to install missing packages.
also I checked whether check_thor.py
works everytime I installed any new package.
Hello In my case, I solved this issue with Pytorch 2.1 by reinstalling Werkzeug and Flask. pip install Werkzeug==2.03 Flask==2.1.1