dbx icon indicating copy to clipboard operation
dbx copied to clipboard

better-args-parser

Open copdips opened this issue 3 years ago • 2 comments

Proposed changes

p.parse_known_args()[0] is enough instead of p.parse_known_args(sys.argv[1:])[0]

Types of changes

What types of changes does your code introduce to dbx? Put an x in the boxes that apply

  • [x] Bugfix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] Documentation Update (if none of the other choices apply)

Further comments

with some minor imports sorting and typo fix in the same files

copdips avatar Sep 14 '22 21:09 copdips

Codecov Report

Merging #477 (61547ad) into main (2ad685f) will not change coverage. The diff coverage is n/a.

:exclamation: Current head 61547ad differs from pull request most recent head a611a06. Consider uploading reports for the commit a611a06 to get more accurate results

@@           Coverage Diff           @@
##             main     #477   +/-   ##
=======================================
  Coverage   91.28%   91.28%           
=======================================
  Files          55       55           
  Lines        2594     2594           
  Branches      374      374           
=======================================
  Hits         2368     2368           
  Misses        159      159           
  Partials       67       67           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Sep 14 '22 21:09 codecov[bot]

@renardeinside, please review

copdips avatar Sep 15 '22 19:09 copdips

@renardeinside what do you think about this PR ?

copdips avatar Oct 06 '22 21:10 copdips

I think there should be a bigger PR to be honest, and it shouldn't be based only on the arguments change.

If we would like to do something like this, we'll need to cover 2 aspects:

  • Easy and simple way to pass multiple arguments. For instance typer is a great framework but might be an overkill in terms of functionality.
  • Convenient and typed way to parse the task configurations file (something like omegaconf + pydantic?)

Therefore with this PR we could do two things:

  • Close it and rework into more profound feature request and make some better design decisions
  • Apply it to deliver the value faster (not sure how much value this change brings, but don't want to discourage your contribution).

Hope this explains my PoV. What would be your PoV on the above said @copdips ?

renardeinside avatar Oct 06 '22 21:10 renardeinside

Hi @renardeinside, thanks for your in-depth reply.

To build a CLI application like dbx itself, I've the same thoughts as yours, typer is a perfect choice. I've never used omegaconf before, just took a quick overview, it seems that there's OmegaConf.from_cli() which looks great.

Maybe the argument parsing is a little bit out of scope of dbx which is not a deployment stuff, but if you would like to introduce new dependencies for user's databricks tasks (not dbx itself), yes, I should than close this PR.

copdips avatar Oct 06 '22 22:10 copdips

I've discovered various options and I think we could use the hydra application framework in our Python templates - it provides various useful interfaces, looks fancy and has good documentation. Would that be something that you agree with @copdips ?

renardeinside avatar Oct 07 '22 08:10 renardeinside

Sorry for the late about this thread. After reading the doc of hydra, it looks like a good choice for config management.

What I think is how to let dbx to use hydra config, as the entrypoint is a python based file, so dbx deploy --deployment-file should handle python file type too, and probably also with a new argument --deployment-file-arguments.

Furthermore it would be nice if can we use the python deployment file in a more generalized way out of hydra, which means whatever it's hydra or not, as long as --deployment-file is a python file, just let dbx to run it, and the given python file should print the deployment in stdout that dbx can capture. By this way, we can fully leverage the python power to generate complex/redundant deployment definition.

copdips avatar Nov 03 '22 09:11 copdips

I also was thinking in the same direction - what if we allow users to build a configuration in the same way as it's done in the FastAPI for instance.

Brief design outline:

# e.g. .dbx/configurator.py
from dbx import configurator
from dbx import Workflow, Task

wf1 = Workflow(...)
wf2 = Workflow(...)

wf1.add(Task(...))
wf1.add(Task(...))

wf2.add(Task(...))

configurator.add_workflow(w1, w2)

And then in the dbx deploy (or launch or whatever):

dbx deploy --deployment-file=.dbx/configurator.py

renardeinside avatar Nov 03 '22 11:11 renardeinside

i think this change requires a bigger PR and proper design phase. Please check this issue and share your thoughts - https://github.com/databrickslabs/dbx/issues/547

renardeinside avatar Nov 03 '22 11:11 renardeinside

This one I'll close for the moment since it won't be implemented in the way it's provided.

renardeinside avatar Nov 03 '22 11:11 renardeinside