dbx icon indicating copy to clipboard operation
dbx copied to clipboard

dbx execute fails to install package on cluster due to malformed subdependency requirements.txt

Open pmason314 opened this issue 2 years ago • 4 comments
trafficstars

Expected Behavior

dbx execute {workflow} --deployment-file={deployment file} --task={task} --cluster-name={cluster} should build a Python-based project into a wheel, upload the wheel to dbfs, and install the package on the specified cluster before running the specified {task}s.

Current Behavior

The building and uploading steps succeed, but during "Installing package on the cluster" Execution fails. The stack trace and pip freeze output is attached. One notable part is the bottom of the stack trace, shown below:

/local_disk0/.ephemeral_nfs/envs/pythonEnv-a0af1ff6-57c1-49d0-8c45-1185e49b4fe8/lib/python3.8/site-packages/pkg_resources/_vendor/packaging/requirements.py in __init__(self, 
requirement_string)
     35             parsed = parse_requirement(requirement_string)
     36         except ParserSyntaxError as e:
---> 37             raise InvalidRequirement(str(e)) from e
     38 
     39         self.name: str = parsed.name

InvalidRequirement: Expected closing RIGHT_PARENTHESIS
    stone (>=2.*)
            ~~~~^

Where stone is a subdependency for the package being built. Since it is a dependency of another third-party dependency (dropbox), it is not easily replaceable. If it's not clear, the "~~~~^" is part of the stack trace and is pointing to the "." in (>=2.*)

Steps to Reproduce (for bugs)

  1. Create a Python package
  2. Add any dependency with a malformed requirements.txt. Known examples include stone and sqlalchemy-mate.
  3. Run any dbx execute command that would install the package on a cluster.

Context

I have been consistently running the exact same dbx commands on dbx==0.8.7 since December 2022, but recently (around the beginning of February 2023) began consistently hitting this parsing error. I have been able to reproduce the error with any package containing stone or sqlalchemy-mate, as well as have installation succeed when those packages are absent. Note that with sqlalchemy-mate the issue is that their requirements.txt file contains the dependency "sqlalchemy>=1.4.1<2.0.0", and dbx's parsing breaks due to the missing comma between >=1.4.1 and <2.0.0.

The parser is correctly finding typos in the dependency requirements.txt's, but I've been unable to identify or further narrow down what caused the change from (typos exist but dbx execute still works) to (typos exist and dbx execute fails). The dependency versions, dbx version, and Databricks Runtime version have remained fixed since December 2022 to the best of my knowledge. Furthermore, I am able to upload and install the same package containing those dependencies on the same Databricks cluster manually.

Your Environment

  • dbx version used: 0.8.7
  • Databricks Runtime version: 9.1 LTS

I would love to know if there's any workaround, any recent known change to how dbx/Databricks parses dependencies, or if this is indeed a bug. Thank you!

pip freeze Output.txt Stack trace.txt

pmason314 avatar Feb 27 '23 17:02 pmason314

I'm experiencing the same issue.

alexisdrakopoulos avatar Mar 10 '23 10:03 alexisdrakopoulos

@alexisdrakopoulos any differences in dbx version or Databricks runtime version in your case? Haven't had any luck narrowing down the cause since I posted this issue.

pmason314 avatar Mar 10 '23 17:03 pmason314

hi @pmason314 , have you tried using a setup.py-based packaging method as described here?

renardeinside avatar Mar 10 '23 17:03 renardeinside

hi @renardeinside - thanks for the response! I can confirm that I'm using a setup.py-based packaging method, with an entrypoint defined in the setup() function that corresponds to the function dbx should run on Databricks.

pmason314 avatar Mar 10 '23 17:03 pmason314