SDMetrics
SDMetrics copied to clipboard
How can we set environment variables for 'store_path' to redirect .local writes?
Environment Details
Please indicate the following details about the environment in which you found the bug:
- SDMetrics version:
- Python version: 3.10.14
- Operating System: RHEL UBI-8 (Kubernetes)
Error Description
I run sdv in a Python environment which is containerised and accessible via Kubernetes. The restricted nature of K8s disallows me from creating any directories in the root file system. Upon import of sdv, I noticed that attempts are made to create .local in the root directory of the pod from where I run Python, and hence I obtain an error traced to :
>>>
Traceback (most recent call last):
File "/python/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/.local/share/sdv'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/python/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/.local/share'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
File "<stdin>", line 2, in <module>
File "<string>", line 2, in <module>
File "/python/lib/python3.10/site-packages/sdv/__init__.py", line 18, in <module>
from sdv import (
File "/python/lib/python3.10/site-packages/sdv/data_processing/__init__.py", line 3, in
<module>
from sdv.data_processing.data_processor import DataProcessor
File "/python/lib/python3.10/site-packages/sdv/data_processing/data_processor.py", line 28, in
<module>
from sdv.metadata.single_table import SingleTableMetadata
File "/python/lib/python3.10/site-packages/sdv/metadata/__init__.py", line 5, in <module>
from sdv.metadata.multi_table import MultiTableMetadata
File "/python/lib/python3.10/site-packages/sdv/metadata/multi_table.py", line 18, in <module>
from sdv.metadata.single_table import SingleTableMetadata
File "/python/lib/python3.10/site-packages/sdv/metadata/single_table.py", line 37, in <module>
SINGLETABLEMETADATA_LOGGER = get_sdv_logger('SingleTableMetadata')
File "/python/lib/python3.10/site-packages/sdv/logging/logger.py", line 62, in get_sdv_logger
logger_conf = get_sdv_logger_config()
File "/python/lib/python3.10/site-packages/sdv/logging/utils.py", line 16, in
get_sdv_logger_config
store_path.mkdir(parents=True, exist_ok=True)
File "/python/lib/python3.10/pathlib.py", line 1179, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/python/lib/python3.10/pathlib.py", line 1179, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/python/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
OSError: [Errno 30] Read-only file system: '/.local'
Steps to reproduce
The steps which got me the error above is simple.
import sdv
However, I traced this back to the different classes called by init and found the offending line was the setting of the store_path variable:
/python3.10/site-packages/sdv/logging/utils.py
83 from pathlib import Path
84 import platformdirs
85
86 store_path = Path(platformdirs.user_data_dir('sdv', 'sdv-dev'))
87
88 print(store_path)
>>>
**/.local/share/sdv**
Note that this is in the root file system, which gives me the error.
Therefore, is there any way to set this path through an environment variable. I can manually edit the above, I suppose, but don't want to touch your inner code.
Thanks.
TEMPORARY: I've solved this using the XDG Data Specifications and identified the environment variable to use (XDG_DATA_HOME). Curious if I missed this in any of your documentation, though, so leaving this hanging around.
Hi @SundareshSankaran, I realize this was filed several months ago so I'm not sure if you're still working on this project/
I wonder if perhaps this issue was filed in the wrong repo? It seems to be that this issue is more about the SDV library rather than SDMetrics, as the code is not related to running any metrics.
Over the past few months, we did fix a number of issues in the SDV library that were related to running SDV on a readonly filesystem. See:
With these issues closed, we've confirmed that users are able to now run SDV on a readonly filesystem, so no more redirection is needed. I'll close this issue off as a duplicate of the above.
However, if you still continue to encounter these, please don't hesitate to file an issue in the SDV library here (https://github.com/sdv-dev/SDV/issues), and we'll be glad to take a look. Thanks!