SDMetrics icon indicating copy to clipboard operation
SDMetrics copied to clipboard

How can we set environment variables for 'store_path' to redirect .local writes?

Open SundareshSankaran opened this issue 1 year ago • 1 comments

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDMetrics version:
  • Python version: 3.10.14
  • Operating System: RHEL UBI-8 (Kubernetes)

Error Description

I run sdv in a Python environment which is containerised and accessible via Kubernetes. The restricted nature of K8s disallows me from creating any directories in the root file system. Upon import of sdv, I noticed that attempts are made to create .local in the root directory of the pod from where I run Python, and hence I obtain an error traced to :

>>>
Traceback (most recent call last):
  File "/python/lib/python3.10/pathlib.py", line 1175, in mkdir
    self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/.local/share/sdv'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/python/lib/python3.10/pathlib.py", line 1175, in mkdir
    self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/.local/share'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<stdin>", line 5, in <module>
  File "<stdin>", line 2, in <module>
  File "<string>", line 2, in <module>
  File "/python/lib/python3.10/site-packages/sdv/__init__.py", line 18, in <module>
    from sdv import (
  File "/python/lib/python3.10/site-packages/sdv/data_processing/__init__.py", line 3, in 
<module>
    from sdv.data_processing.data_processor import DataProcessor
  File "/python/lib/python3.10/site-packages/sdv/data_processing/data_processor.py", line 28, in 
<module>
    from sdv.metadata.single_table import SingleTableMetadata
  File "/python/lib/python3.10/site-packages/sdv/metadata/__init__.py", line 5, in <module>
    from sdv.metadata.multi_table import MultiTableMetadata
  File "/python/lib/python3.10/site-packages/sdv/metadata/multi_table.py", line 18, in <module>
    from sdv.metadata.single_table import SingleTableMetadata
  File "/python/lib/python3.10/site-packages/sdv/metadata/single_table.py", line 37, in <module>
    SINGLETABLEMETADATA_LOGGER = get_sdv_logger('SingleTableMetadata')
  File "/python/lib/python3.10/site-packages/sdv/logging/logger.py", line 62, in get_sdv_logger
    logger_conf = get_sdv_logger_config()
  File "/python/lib/python3.10/site-packages/sdv/logging/utils.py", line 16, in 
get_sdv_logger_config
    store_path.mkdir(parents=True, exist_ok=True)
  File "/python/lib/python3.10/pathlib.py", line 1179, in mkdir
    self.parent.mkdir(parents=True, exist_ok=True)
  File "/python/lib/python3.10/pathlib.py", line 1179, in mkdir
    self.parent.mkdir(parents=True, exist_ok=True)
  File "/python/lib/python3.10/pathlib.py", line 1175, in mkdir
    self._accessor.mkdir(self, mode)
OSError: [Errno 30] Read-only file system: '/.local'

Steps to reproduce

The steps which got me the error above is simple.

import sdv

However, I traced this back to the different classes called by init and found the offending line was the setting of the store_path variable:

/python3.10/site-packages/sdv/logging/utils.py

83   from pathlib import Path
84   import platformdirs
85   
86   store_path = Path(platformdirs.user_data_dir('sdv', 'sdv-dev'))
87   
88   print(store_path)

>>>
**/.local/share/sdv**

Note that this is in the root file system, which gives me the error.

Therefore, is there any way to set this path through an environment variable. I can manually edit the above, I suppose, but don't want to touch your inner code.

Thanks.

SundareshSankaran avatar Sep 30 '24 00:09 SundareshSankaran

TEMPORARY: I've solved this using the XDG Data Specifications and identified the environment variable to use (XDG_DATA_HOME). Curious if I missed this in any of your documentation, though, so leaving this hanging around.

SundareshSankaran avatar Sep 30 '24 01:09 SundareshSankaran

Hi @SundareshSankaran, I realize this was filed several months ago so I'm not sure if you're still working on this project/

I wonder if perhaps this issue was filed in the wrong repo? It seems to be that this issue is more about the SDV library rather than SDMetrics, as the code is not related to running any metrics.

Over the past few months, we did fix a number of issues in the SDV library that were related to running SDV on a readonly filesystem. See:

With these issues closed, we've confirmed that users are able to now run SDV on a readonly filesystem, so no more redirection is needed. I'll close this issue off as a duplicate of the above.

However, if you still continue to encounter these, please don't hesitate to file an issue in the SDV library here (https://github.com/sdv-dev/SDV/issues), and we'll be glad to take a look. Thanks!

npatki avatar Aug 19 '25 22:08 npatki