[ISSUE] WorkspaceClient dbutils.fs.cp method not working in Windows
Description
I'm using databricks-sdk-py through Databricks Connect 13.2, in order to access the functions from dbutils as instructed here. When I list the files everything works OK, but when I try to copy files from dbfs to dbfs I get an exception.
Debugging the error, it appears to come from this line, which is applied both to the source and destination files, and generates WindowsPath('<_DbfsPath /FileStore/xxxxx/titanic.csv>'), which in turn converts '/' to '\'.
Reproduction
from databricks.sdk import WorkspaceClient
w = WorkspaceClient(profile=os.environ.get("DATABRICKS_CONFIG_PROFILE"))
d = w.dbutils
d.fs.cp("dbfs:/FileStore/xxxxx/titanic.csv", "dbfs:/FileStore/xxxxx/titanic_cp.csv")
This raises the following exception:
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2022.3.3\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<input>", line 1, in <module>
File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\dbutils.py", line 43, in cp
self._dbfs.copy(from_, to, recursive=recurse)
File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\mixins\files.py", line 371, in copy
if dst.exists() and dst.is_dir():
File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\mixins\files.py", line 292, in exists
return self._api.exists([self.as](http://self.as/)_string)
File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\mixins\files.py", line 358, in exists
raise e
File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\mixins\files.py", line 353, in exists
self.get_status(path)
File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\service\files.py", line 307, in get_status
json = self._api.do('GET', '/api/2.0/dbfs/get-status', query=query)
File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\core.py", line 963, in do
raise self._make_nicer_error(status_code=response.status_code, **payload) from None
databricks.sdk.core.DatabricksError: Path must be absolute: \FileStore\xxxxx\titanic_cp.csv
Expected behavior My titanic.csv file from dbfs should be copied and a new titanic_cp.csv file should appear in the FileStore.
Debug Logs
2023-08-09 10:27:43,448 [databricks.sdk][INFO] loading DEFAULT profile from ~/.databrickscfg: host, token, jobs-api-version
2023-08-09 10:27:43,448 [databricks.sdk][DEBUG] Attempting to configure auth: pat
2023-08-09 10:27:43,715 [databricks.sdk][DEBUG] GET /api/2.0/dbfs/get-status?path=\FileStore\xxxxx\titanic_cp.csv
< 400 Bad Request
< {
< "error_code": "INVALID_PARAMETER_VALUE",
< "message": "Path must be absolute: \\FileStore\\xxxxx\\titanic_cp.csv"
< }
Traceback (most recent call last):
File "C:\Users\my_user\Documents\Projects\databricks_connect_13\datbricks_connect_prueba\databricks_sdk_error.py", line 13, in <module>
d.fs.cp("dbfs:/FileStore/xxxxx/titanic.csv", "dbfs:/FileStore/xxxxx/titanic_cp.csv")
File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\dbutils.py", line 43, in cp
self._dbfs.copy(from_, to, recursive=recurse)
File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\mixins\files.py", line 371, in copy
if dst.exists() and dst.is_dir():
File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\mixins\files.py", line 292, in exists
return self._api.exists(self.as_string)
File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\mixins\files.py", line 358, in exists
raise e
File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\mixins\files.py", line 353, in exists
self.get_status(path)
File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\service\files.py", line 307, in get_status
json = self._api.do('GET', '/api/2.0/dbfs/get-status', query=query)
File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\core.py", line 963, in do
raise self._make_nicer_error(status_code=response.status_code, **payload) from None
databricks.sdk.core.DatabricksError: Path must be absolute: \FileStore\xxxxx\titanic_cp.csv
Other Information
- OS
Edition Windows 10 Pro
Version 22H2
Installed on 22/10/2020
OS build 19045.3208
Experience Windows Feature Experience Pack 1000.19041.1000.0
- Python 3.10
- Package versions
- databricks-cli==0.17.7
- databricks-connect==13.2.0
- databricks-sdk==0.4.0
Thank you for the bug report! Currently, the team operates in a limited capacity, carefully prioritizing, and we cannot provide a timeline to implement this feature. Please make a Pull Request if you'd like to see this feature sooner, and we'll guide you through the journey.
It'll be great if you try fixing it and testing it on windows: relevant code is https://github.com/databricks/databricks-sdk-py/blob/main/databricks/sdk/mixins/files.py#L365-L383
Hi Team, I had a similar problem. Any update on this?