databricks-sdk-py icon indicating copy to clipboard operation
databricks-sdk-py copied to clipboard

[ISSUE] WorkspaceClient dbutils.fs.cp method not working in Windows

Open CarlaFernandez opened this issue 2 years ago • 2 comments

Description I'm using databricks-sdk-py through Databricks Connect 13.2, in order to access the functions from dbutils as instructed here. When I list the files everything works OK, but when I try to copy files from dbfs to dbfs I get an exception.

Debugging the error, it appears to come from this line, which is applied both to the source and destination files, and generates WindowsPath('<_DbfsPath /FileStore/xxxxx/titanic.csv>'), which in turn converts '/' to '\'.

Reproduction

from databricks.sdk import WorkspaceClient
w = WorkspaceClient(profile=os.environ.get("DATABRICKS_CONFIG_PROFILE"))
d = w.dbutils
d.fs.cp("dbfs:/FileStore/xxxxx/titanic.csv", "dbfs:/FileStore/xxxxx/titanic_cp.csv")

This raises the following exception:

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2022.3.3\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
  File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\dbutils.py", line 43, in cp
    self._dbfs.copy(from_, to, recursive=recurse)
  File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\mixins\files.py", line 371, in copy
    if dst.exists() and dst.is_dir():
  File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\mixins\files.py", line 292, in exists
    return self._api.exists([self.as](http://self.as/)_string)
  File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\mixins\files.py", line 358, in exists
    raise e
  File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\mixins\files.py", line 353, in exists
    self.get_status(path)
  File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\service\files.py", line 307, in get_status
    json = self._api.do('GET', '/api/2.0/dbfs/get-status', query=query)
  File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\core.py", line 963, in do
    raise self._make_nicer_error(status_code=response.status_code, **payload) from None
databricks.sdk.core.DatabricksError: Path must be absolute: \FileStore\xxxxx\titanic_cp.csv

Expected behavior My titanic.csv file from dbfs should be copied and a new titanic_cp.csv file should appear in the FileStore.

Debug Logs

2023-08-09 10:27:43,448 [databricks.sdk][INFO] loading DEFAULT profile from ~/.databrickscfg: host, token, jobs-api-version
2023-08-09 10:27:43,448 [databricks.sdk][DEBUG] Attempting to configure auth: pat
2023-08-09 10:27:43,715 [databricks.sdk][DEBUG] GET /api/2.0/dbfs/get-status?path=\FileStore\xxxxx\titanic_cp.csv
< 400 Bad Request
< {
<   "error_code": "INVALID_PARAMETER_VALUE",
<   "message": "Path must be absolute: \\FileStore\\xxxxx\\titanic_cp.csv"
< }
Traceback (most recent call last):
  File "C:\Users\my_user\Documents\Projects\databricks_connect_13\datbricks_connect_prueba\databricks_sdk_error.py", line 13, in <module>
    d.fs.cp("dbfs:/FileStore/xxxxx/titanic.csv", "dbfs:/FileStore/xxxxx/titanic_cp.csv")
  File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\dbutils.py", line 43, in cp
    self._dbfs.copy(from_, to, recursive=recurse)
  File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\mixins\files.py", line 371, in copy
    if dst.exists() and dst.is_dir():
  File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\mixins\files.py", line 292, in exists
    return self._api.exists(self.as_string)
  File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\mixins\files.py", line 358, in exists
    raise e
  File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\mixins\files.py", line 353, in exists
    self.get_status(path)
  File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\service\files.py", line 307, in get_status
    json = self._api.do('GET', '/api/2.0/dbfs/get-status', query=query)
  File "C:\Users\my_user\Documents\Projects\my_project\venv_test_DBC13\lib\site-packages\databricks\sdk\core.py", line 963, in do
    raise self._make_nicer_error(status_code=response.status_code, **payload) from None
databricks.sdk.core.DatabricksError: Path must be absolute: \FileStore\xxxxx\titanic_cp.csv

Other Information

  • OS
Edition	Windows 10 Pro
Version	22H2
Installed on	‎22/‎10/‎2020
OS build	19045.3208
Experience	Windows Feature Experience Pack 1000.19041.1000.0
  • Python 3.10
  • Package versions
    • databricks-cli==0.17.7
    • databricks-connect==13.2.0
    • databricks-sdk==0.4.0

CarlaFernandez avatar Aug 09 '23 08:08 CarlaFernandez

Thank you for the bug report! Currently, the team operates in a limited capacity, carefully prioritizing, and we cannot provide a timeline to implement this feature. Please make a Pull Request if you'd like to see this feature sooner, and we'll guide you through the journey.

It'll be great if you try fixing it and testing it on windows: relevant code is https://github.com/databricks/databricks-sdk-py/blob/main/databricks/sdk/mixins/files.py#L365-L383

nfx avatar Aug 09 '23 09:08 nfx

Hi Team, I had a similar problem. Any update on this?

isKEKE avatar Mar 14 '24 03:03 isKEKE