[BUG] Random medperf local server failures: os.getcwd(), FileNotFoundError: No such file or directory
Issue description
I'm running medperf tutorials in WSL and face a strange behaviour when client and server start to fail randomly. As firstly I was thinking that's an internal medperf issue, I'm going to document details here. When passing tutorials https://docs.medperf.org/getting_started/benchmark_owner_demo/ (this and other ones), I use a local medperf server. While running some (random) commands (usually heavy ones, that require a lot of i/o operations), I got the following error:
Client side:
Traceback (most recent call last):
File "/home/vukw/anaconda3/envs/env39_medperf/bin/mlcube", line 5, in <module>
from mlcube.__main__ import cli
File "/home/vukw/anaconda3/envs/env39_medperf/lib/python3.9/site-packages/mlcube/__main__.py", line 66, in <module>
default=os.getcwd(),
FileNotFoundError: [Errno 2] No such file or directory
Interesting thing is that it touch not only client side, but a server side also (that's running in an independent bash terminal):
Traceback (most recent call last):
File "/home/vukw/anaconda3/envs/env39_medperf/lib/python3.9/site-packages/django/db/backends/base/base.py", line 219, in ensure_connection
self.connect()
File "/home/vukw/anaconda3/envs/env39_medperf/lib/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
return func(*args, **kwargs)
File "/home/vukw/anaconda3/envs/env39_medperf/lib/python3.9/site-packages/django/db/backends/base/base.py", line 200, in connect
self.connection = self.get_new_connection(conn_params)
File "/home/vukw/anaconda3/envs/env39_medperf/lib/python3.9/site-packages/django/utils/asyncio.py", line 33, in inner
return func(*args, **kwargs)
File "/home/vukw/anaconda3/envs/env39_medperf/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 209, in get_new_connection
conn = Database.connect(**conn_params)
sqlite3.OperationalError: unable to open database file
Still, rerun server doesn't help:
$ sh setup-dev-server.sh
realpath: cert.crt: No such file or directory
realpath: cert.key: No such file or directory
1
1
0
CERT FILE must not be empty
Moreover, not just medperf is broken, but pip also:
$ pip list
The folder you are executing pip from can no longer be found.
Workarounds and solutions.
Workarounds
- First of all, rerunning server and client in a new bash terminal helps to fix issue - for a while. Still after a few commands error is raised again.
cd .also helps like a magic. Looks like it resets working directory path - but again only for a while.
Solution debugging
Together with @hasan7n we've found that sometimes such a behavior can be noticed on external encrypted storages: stackoverflow discussion. In my case I checked out repo in Windows env - so all the files are located somewhere on /mnt/c/Users/vykuk/repos/mlc/medperf, that's actually an external and encrypted drive. Moreover, we've found a WSL issue with a similar behavior and workaround, but without notes about drive encryption. So, looks like WSL mounting drive (in my case) is a particular kind of main problem - that sometimes external drives can be locked & unlocked, and it causes working directory issues for all the scripts running on that storages.
Solution
Thus, a reasonable solution (that helped in my case also) is to move a whole medperf repository from windows host mounted drive /mnt/c/.... to the internal WSL filesystem. Moving the whole repo folder to /home/medperf removes the issue.
Future explorations
I still don't know why exactly mounted storage is locked, which conditions lead to it and who is responsible (Windows host or Ubuntu itself). Also, I didn't met such an issue with other projects located on mounted drive - medperf is the first one who reproduces that behavior. Finally, the nature of the issue makes it extremely hard to find a way to reproduce it with 100% guarantee. Same commands can sometimes pass successfully, and next time fail with error.
We can expect same issue may arise in other systems & combinations - when medperf repo is located on external storages.
Environment
- Host system: Windows 11, 22H2, OS build 22623.891
- WSL 1.2.5.0
- WSL image (
$ uname -r): 5.15.90.1-microsoft-standard-WSL2 - Guest system: (
lsb_release -a): Ubuntu 22.04.1 LTS