mujoco icon indicating copy to clipboard operation
mujoco copied to clipboard

Resource not found (ValueError), but only on consecutive runs.

Open hubernikus opened this issue 1 year ago • 12 comments

When running mujoco.MjModel.from_xml_path multiple times in a row in an active python session, the loop at one stage stumbles raises aResource not found error (see console output below).

This is unexpected, as calling the function worked multiple times, with all variables being local. Only at a certain iteration did the loading of the model fail.

The error issue pops up earlier, when the model becomes more complex, e.g., having more of the same robot, or having robots with more and more complex meshes.

While it can be resolved by just restarting the console and reloading everything, in some cases this is not desired, e.g., during unit tests.

System Information: Microsoft Windows 10 Enterprise Python 3.10 / Mujoco 3.1.5 / dm_control: 1.0.19

Console Output:

Did it again 0
Did it again 1
... 
Did it again 24
Traceback (most recent call last):
  File "c:\Code\mujoco_example\example_multi_collision.py", line 29, in <module>

    example_mujoco()
  File "c:\Code\mujoco_example\example_multi_collision.py", line 22, in example_mujoco
    model = mujoco.MjModel.from_xml_path(posix_path)                            mujoco
ValueError: Error: resource not found via provider or OS filesystem: '.mujoco_cac
[mujoco_example.zip](https://github.com/google-deepmind/mujoco/files/15400644/mujoco_example.zip)
he/cube-2722594e0d5761a6def692d31d8bac29b96eef8b.obj'                          che/cube-2722594e0d5761a6def692d31d8bac29b96eef8b.obj'
PS C:\Code\mujoco_example>

Minimal example: mujoco_example.zip

hubernikus avatar May 22 '24 08:05 hubernikus

Thanks for reporting this. I have a very good idea where the issue is coming from, and it should hopefully be fixed soon.

kbayes avatar May 22 '24 14:05 kbayes

Thanks so much for the fast reply!

Just an update, I realized that the issue also pops up if I add more than 255 meshes at once, i.e., in the file attached, 256 (or more) robots are added at once.

hubernikus avatar May 22 '24 17:05 hubernikus

Are you able to reproduce the error with the provided mujoco_example.zip? I ran the given code with the same versions of dm_control and mujoco multiple times without triggering an error, including with your suggestion of attaching 256 robots.

kbayes avatar May 23 '24 13:05 kbayes

I suspect this might be a Windows-specific issue...

yuvaltassa avatar May 23 '24 13:05 yuvaltassa

Yes, the error occurs with the attached files (without modification).

Could Windows specific, yes.

I'll try to find time to investigate it further.

hubernikus avatar May 24 '24 08:05 hubernikus

I have the same problem. It doesn't occur every time. I wrote a for loop and I read a .xml file in every loop. It works fine at the beginning and the error occures after some loops. I don't know how to solve it.

z-yf17 avatar May 24 '24 08:05 z-yf17

@z-yf17 are you using Windows?

yuvaltassa avatar May 24 '24 08:05 yuvaltassa

@z-yf17 are you using Windows?

Yes.

There is a workable but cumbersome method, which is to change all the paths, including the include paths, to absolute paths. This is troublesome, but it won't result in errors.

z-yf17 avatar May 24 '24 08:05 z-yf17

@z-yf17 Thanks for the extra info. This is useful.

kbayes avatar May 24 '24 08:05 kbayes

True, passing the absolute path delays the issue. I wrote a xml-post-processing to try that.

But for me that just delays the issue and the main xml cannot be found anymore, and the function call fails at now 520 runs.

Below the error and minimal example.

Traceback (most recent call last):
Attempting it= 1
Succeded it= 1
Attempting it= 2
..
Succeded it= 520
Attempting it=521
  File "c:\Code\mujoco_example\example_multi_collision.py", line 49, in <module>
    example_mujoco()
  File "c:\Code\mujoco_example\example_multi_collision.py", line 43, in example_mujoco
    model = mujoco.MjModel.from_xml_path(infile.absolute().as_posix())
ValueError: mjParseXML: resource not found via provider or OS filesystem: 'C:/Code/mujoco_example/.mujoco_cache/Scene.xml'

mujoco_example.zip

hubernikus avatar May 24 '24 10:05 hubernikus

i can also reproduce the OP's example on a debian-based system after 50 iterations:

[...]
Did it again 50
Traceback (most recent call last):
  File "/home/luda/mujoco_example/example_multi_collision.py", line 27, in <module>
  File "/home/luda/mujoco_example/example_multi_collision.py", line 21, in example_mujoco
ValueError: Error: resource not found via provider or OS filesystem: '.mujoco_cache/cube-2722594e0d5761a6def692d31d8bac29b96eef8b.obj'

skarbeli avatar May 24 '24 11:05 skarbeli

I tried to run the script on an Ubuntu 22.04 machine, and I surprisingly obtained inconsistent results depending on where it is run. (Same machine, same virtual environment.)

I get an error after 5 runs if I call it in the bash terminal: bash_terminal

The script crashes with a different error after 260 runs if I use vscode, with the run button: vscode_automatic_execution

The script runs without error (for large numbers of iterations), when executed via script in the vscode bash-terminal (python main.py): vscode_console_call

hubernikus avatar May 25 '24 12:05 hubernikus

Faced the same issue on Ubuntu, a maximum of 32 iterations for me. As noticed by @hubernikus, the issue not appears with vscode terminal. After some time I figured out that the reason is the limit in the number of open file descriptors. By default it's higher in vscode. You can change it any terminal session with ulimit -n 10000 (Change to some high number you need). There is a way to make it permanent but it requires more steps and depends on OS. Hope it helps

rihat99 avatar Jul 04 '24 12:07 rihat99

Before, I was never able to reproduce the issue on my system. However, I was able to find the missing fclose. Thanks for the pointer! :)

kbayes avatar Jul 04 '24 14:07 kbayes