dbx
dbx copied to clipboard
dbx execute fails to install package on cluster due to malformed subdependency requirements.txt
Expected Behavior
dbx execute {workflow} --deployment-file={deployment file} --task={task} --cluster-name={cluster} should build a Python-based project into a wheel, upload the wheel to dbfs, and install the package on the specified cluster before running the specified {task}s.
Current Behavior
The building and uploading steps succeed, but during "Installing package on the cluster" Execution fails. The stack trace and pip freeze output is attached. One notable part is the bottom of the stack trace, shown below:
/local_disk0/.ephemeral_nfs/envs/pythonEnv-a0af1ff6-57c1-49d0-8c45-1185e49b4fe8/lib/python3.8/site-packages/pkg_resources/_vendor/packaging/requirements.py in __init__(self,
requirement_string)
35 parsed = parse_requirement(requirement_string)
36 except ParserSyntaxError as e:
---> 37 raise InvalidRequirement(str(e)) from e
38
39 self.name: str = parsed.name
InvalidRequirement: Expected closing RIGHT_PARENTHESIS
stone (>=2.*)
~~~~^
Where stone is a subdependency for the package being built. Since it is a dependency of another third-party dependency (dropbox), it is not easily replaceable. If it's not clear, the "~~~~^" is part of the stack trace and is pointing to the "." in (>=2.*)
Steps to Reproduce (for bugs)
- Create a Python package
- Add any dependency with a malformed requirements.txt. Known examples include
stoneandsqlalchemy-mate. - Run any
dbx executecommand that would install the package on a cluster.
Context
I have been consistently running the exact same dbx commands on dbx==0.8.7 since December 2022, but recently (around the beginning of February 2023) began consistently hitting this parsing error. I have been able to reproduce the error with any package containing stone or sqlalchemy-mate, as well as have installation succeed when those packages are absent. Note that with sqlalchemy-mate the issue is that their requirements.txt file contains the dependency "sqlalchemy>=1.4.1<2.0.0", and dbx's parsing breaks due to the missing comma between >=1.4.1 and <2.0.0.
The parser is correctly finding typos in the dependency requirements.txt's, but I've been unable to identify or further narrow down what caused the change from (typos exist but dbx execute still works) to (typos exist and dbx execute fails). The dependency versions, dbx version, and Databricks Runtime version have remained fixed since December 2022 to the best of my knowledge. Furthermore, I am able to upload and install the same package containing those dependencies on the same Databricks cluster manually.
Your Environment
- dbx version used: 0.8.7
- Databricks Runtime version: 9.1 LTS
I would love to know if there's any workaround, any recent known change to how dbx/Databricks parses dependencies, or if this is indeed a bug. Thank you!
I'm experiencing the same issue.
@alexisdrakopoulos any differences in dbx version or Databricks runtime version in your case? Haven't had any luck narrowing down the cause since I posted this issue.
hi @pmason314 , have you tried using a setup.py-based packaging method as described here?
hi @renardeinside - thanks for the response! I can confirm that I'm using a setup.py-based packaging method, with an entrypoint defined in the setup() function that corresponds to the function dbx should run on Databricks.