dbt-core icon indicating copy to clipboard operation
dbt-core copied to clipboard

Feature/dbt deps tarball

Open timle2 opened this issue 2 years ago • 21 comments

Continued from https://github.com/dbt-labs/dbt-core/pull/4220

in support of Issue [Feature] add new dbt.deps type: url to internally hosted tarball #4205

resolves #

Proposed solution for feature request 4205

Description

Enable direct linking to tarball urls in packages.yml, for example:

# packages.yml
packages:
  - tarball: https://codeload.github.com/dbt-labs/dbt-utils/tar.gz/0.6.6
  - sha1: 549db5560efccbadc7ec7834c03cf13579199a66
# sha1 is optional. if specified, will only install package if downloaded tar matches. 
  - subdirectory: dbt-utils-0.6.6
# subdirectory is optional. By default package root will be based on single the parent folder in tarfile. Subdirectory arg here can be used to override (based on subdirectory arg in git package).

Use case is for dbt projects that need to host internal private packages - without relying on the git package install option. (public tarball above used here as example only). This is applicable when a) internal packages are being build for non-public use b) cloning packages from git is not supported (or discouraged) by existing deployment systems c) internal file hosting service (such as internal artifactory service or internal cloud storage buckets) can be easily configured to host packages for install during deployment.

Sketching out doc changes here: https://github.com/timle2/docs.getdbt.com/blob/dbt-docs-tarball-package-updates/website/docs/docs/building-a-dbt-project/package-management.md#tar-files

Checklist

  • [x] I have signed the CLA
  • [x] I have run this code in development and it appears to resolve the stated issue
  • [x] This PR includes tests, or tests are not required/relevant for this PR
  • [TODO] I have updated the CHANGELOG.md and added information about my change

timle2 avatar Feb 05 '22 21:02 timle2

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

cla-bot[bot] avatar Feb 05 '22 21:02 cla-bot[bot]

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

cla-bot[bot] avatar Feb 05 '22 23:02 cla-bot[bot]

@cla-bot check

emmyoop avatar Feb 07 '22 14:02 emmyoop

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

cla-bot[bot] avatar Feb 07 '22 14:02 cla-bot[bot]

The cla-bot has been summoned, and re-checked this pull request!

cla-bot[bot] avatar Feb 07 '22 14:02 cla-bot[bot]

@timle2 thanks for opening up the PR with your new username! I've double checked and you did indeed sign the CLA. We have your username listed in the correct format. Looks like you may need to configure with your email to get our bot to process it.

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

emmyoop avatar Feb 07 '22 14:02 emmyoop

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

cla-bot[bot] avatar Feb 13 '22 22:02 cla-bot[bot]

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

cla-bot[bot] avatar Feb 13 '22 23:02 cla-bot[bot]

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

cla-bot[bot] avatar Feb 13 '22 23:02 cla-bot[bot]

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

cla-bot[bot] avatar Feb 13 '22 23:02 cla-bot[bot]

Issues referenced no longer apply to revision 3 - leaving for history

TODO: #4220 (comment)

We've updated how to we log to use the Events modules. You'll need to create a new log event for this, and everywhere else you log, and use fire_event to kick off writing the log. We have a README with some good info on our logging system.

✅ Switched all the logging over to the Events / fire_event.

TODO #4220 (comment)

As a standard we don't use assert in our code other than for tests. They shadow the actual exceptions and throw an AssertionError instead. Please rewrite the assert statements to catch relevant exceptions (create new ones if needed!). In this specific case, I do like the use is_tarfile to check the file! Can you think of a reason it would be worth retrying the download if the file is not a valid tarfile?

✅ Moved over to properly defined exceptions. For the most part leveraging dbt.exceptions.DependencyException.

Can you think of a reason it would be worth retrying the download if the file is not a valid tarfile? No I think straight failure without retry makes the most sense. In almost all cases the user would want to know that the tarfile is invalid, with dbt deps failing, as opposed to hitting the server multiple times to confirm. A transmission error could result in a corrupted tar file, however this is such an edge case I'm not sure it's worth guarding against.

timle2 avatar Feb 14 '22 00:02 timle2

Can you think of a reason it would be worth retrying the download if the file is not a valid tarfile? No I think straight failure without retry makes the most sense. In almost all cases the user would want to know that the tarfile is invalid, with dbt deps failing, as opposed to hitting the server multiple times to confirm. A transmission error could result in a corrupted tar file, however this is such an edge case I'm not sure it's worth guarding against.

I agree that this is rare, but it does happen. The user will still find out the tar file is invalid if that is indeed the issue and not a transmission error we could recover from. This will be rare but worth protecting against. In dbt Cloud deps are installed every time a job runs so we want to ensure if there is a transmission error we can recover from with a simple retry, we do so as not to fail an entire job.

Unrelated: You'll want to pull in the latest on the main branch as we've added some auto formatting/checking and that is why you have a failing test. You can read more about it here.

emmyoop avatar Feb 15 '22 16:02 emmyoop

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

cla-bot[bot] avatar Feb 21 '22 16:02 cla-bot[bot]

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

cla-bot[bot] avatar Feb 21 '22 17:02 cla-bot[bot]

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

cla-bot[bot] avatar Feb 21 '22 17:02 cla-bot[bot]

Is this something that you're still interested in working on and getting over the finish line?

gshank avatar May 02 '22 14:05 gshank

This would be quite useful. Do you still plan to finish this?

the-chris-mitchell avatar Jul 28 '22 01:07 the-chris-mitchell

Yes this is something my team would benefit from as well, and I'm glad others see the value in an additional install method for deps. I haven't had time to get this past the finish line unfortunately. I welcome any collaborators/input/help whatever it takes to get this approved though! At minimum, I'll be able to finish this in Sept, but obviously if others are interested I'd love to get this done sooner.

timle2 avatar Jul 28 '22 02:07 timle2

@timle2 it's great that you're still interested in getting this over the finish line, even if it has to wait until September. Thanks so much! I did want to drop in to say that we have done quite a bit of work around dbt deps since you worked on this issue last. It would be helpful to peruse the changes we've made to (including adding caching!) to ensure your approach matches the pattern we're now using with other methods.

emmyoop avatar Aug 09 '22 15:08 emmyoop

@emmyoop Based on feedback from revision 1 and 2, I've stripped this PR way down, to the abosolute minumum required to support the tarball feature.

I've removed features - nice but not required - such as supporting subfolders like git packages, and sha1 checks. Now we lean on existing code from RegistryPinnedPackage (install() and download_and_untar()). This slims down the PR down to only the most essential code additions :)

~However, RegistryPinnedPackage install() and download_and_untar() are not currently set up as functions that can be used across classes, and I'm not smart enought to set up some crazy class inheritance structure to get it done. So I've copy/pasted the fns from RegistryPinnedPackage to TarballPinnedPackage. I'd advocate for moving these fns to dbt.deps.base, or to a dbt.deps.common file, but I also don't feel that I'm the person to make that call - need some feedback from dbt maintainers here on how to move forward based dbt labs thoughts.~ I've moved RegistryPinnedPackage install() and download_and_untar() to the parent PinnedPackage class so the fns can be reused across RegistryPinnedPackage and TarballPinnedPackage. I'm not sure about the naming convention the team supports for this design though (I did install() -> _install() on the parent class), so please lmk on that.

timle2 avatar Nov 06 '22 23:11 timle2

Thank you @the-chris-mitchell for sharing support on this. I've picked up

@timle2 it's great that you're still interested in getting this over the finish line, even if it has to wait until September. Thanks so much! I did want to drop in to say that we have done quite a bit of work around dbt deps since you worked on this issue last. It would be helpful to peruse the changes we've made to (including adding caching!) to ensure your approach matches the pattern we're now using with other methods.

done and done! totally rewrote from main branch :)

timle2 avatar Nov 06 '22 23:11 timle2

Thank you @the-chris-mitchell for sharing support on this. I've picked it back up and hoping to get across finish line in the next few weeks.

timle2 avatar Nov 06 '22 23:11 timle2

Would love to see this released! :) (Also see my comment here: https://github.com/dbt-labs/dbt-core/issues/4205#issuecomment-1293401553)

the-serious-programmer avatar Nov 07 '22 07:11 the-serious-programmer

@emmyoop something funny going on with GitHub actions today? Last git workflow run failed with the following: image

I've done an update from master branch in case there was anything committed in that would fix this, but doesn't look like it at quick glance.

timle2 avatar Nov 21 '22 19:11 timle2

@timle2 I think branches that start with feature/ have a special meaning for us.

Can you rename it to something else (like timle2/dbt-deps-tarball) and see if it resolves the issue you are seeing with pre-commit?

dbeatty10 avatar Nov 21 '22 21:11 dbeatty10

@timle2 I think branches that start with feature/ have a special meaning for us.

Can you rename it to something else (like timle2/dbt-deps-tarball) and see if it resolves the issue you are seeing with pre-commit?

Thanks for the help here. I've got a second PR with the altered branch name here, see if that works. https://github.com/dbt-labs/dbt-core/pull/6302 No way for me to change the branch name here without closing the PR it seems.

timle2 avatar Nov 22 '22 00:11 timle2

@timle2 I think branches that start with feature/ have a special meaning for us. Can you rename it to something else (like timle2/dbt-deps-tarball) and see if it resolves the issue you are seeing with pre-commit?

Thanks for the help here. I've got a second PR with the altered branch name here, see if that works. #6302 No way for me to change the branch name here without closing the PR it seems.

Actually this PR working fine now, yay! Back to getting flake errors like I'm used to 😆

timle2 avatar Nov 22 '22 01:11 timle2

@timle2 Something for you to do:

  1. Add this as the first line of your PR description:
    • resolves #4205
    • The "resolves" keyword will link the PR to the issue and will automatically close the latter at the appropriate time.
    • Having it as the first line is just a convention we used and not strictly required (but appreciated!)

dbeatty10 avatar Nov 22 '22 01:11 dbeatty10

@timle2 this looks great! 🎉 Just had a few small comments.

I wanted to give you some feedback now since I know it was overdue! Before I officially approve and merge this in I wanted to do some manual testing on it. I'll let you know if I find anything testing and if not I'll get this merged in once you finish a bit of cleanup.

🥳 Thanks @emmyoop! Thanks for the support through this PR, greatly appreciated. I've addressed all the comments, but please let me know if I've missed anything.

Flake/Black passing on my local machine, so this should be good to re-run workflows for another round of validation.

timle2 avatar Nov 26 '22 20:11 timle2

I'll do some manual testing on this and report back here afterwards.

dbeatty10 avatar Nov 28 '22 20:11 dbeatty10