blazingsql
blazingsql copied to clipboard
[BUG] Log Directory creation causes error (unless it exists already)
Describe the bug
Running on 90 workers, I get the following error
Could not create directory: /gpfs/alpine/proj-shared/gen119/bsql_shared/logs_ucx_1060213[Errno 17] File exists: '/gpfs/alpine/proj-shared/gen119/bsql_shared/logs_ucx_1060213'
distributed.worker - WARNING - Compute Failed
Function: initialize_server_directory
args: ('/gpfs/alpine/proj-shared/gen119/bsql_shared/logs_ucx_1060213', True)
kwargs: {}
Exception: FileExistsError(17, 'File exists')
The directory /gpfs/alpine/proj-shared/gen119/bsql_shared/logs_ucx_1060213
did not exist prior to launching the job.
Steps/Code to reproduce bug
Launch BlazingSQL on a sufficient number of workers to trigger the race condition, set LOG to the above directory (and make sure it doesn't exist yet), and set the following environment variables
export BLAZING_LOGGING_DIRECTORY=${LOG}
export BLAZING_LOCAL_LOGGING_DIRECTORY=${LOG}
export BSQL_BLAZING_LOGGING_DIRECTORY=${LOG}
export BSQL_BLAZING_LOCAL_LOGGING_DIRECTORY=${LOG}
export ENABLE_COMMS_LOGS=False
export BSQL_ENABLE_COMMS_LOGS=False
export BSQL_ENABLE_TASK_LOGS=True
export BSQL_ENABLE_OTHER_ENGINE_LOGS=True
export RMM_DEBUG_LOG_FILE=${LOG}/rmm_log.txt
Expected behavior
The directory should be silently created if it doesn't exist yet.
- BlazingSQL Version 0.19
Environment details
Please run and paste the output of the print_env.sh
script here, to gather any other relevant environment details
Additional context
Suspected source of the issue
in pyblazing/apiv2/context.py
def initialize_server_directory(dir_path, is_dask):
if not os.path.exists(dir_path):
try:
os.mkdir(dir_path)
except OSError as error:
get_blazing_logger(is_dask).error(
f"Could not create directory: {dir_path}" + str(error)
)
raise
return True
else:
return True
This should intercept the FileExistsError
and then silently return (instead of using os.path.exists
, which results in a race condition).