dbt-core
dbt-core copied to clipboard
Feature/dbt deps tarball
Continued from https://github.com/dbt-labs/dbt-core/pull/4220
in support of Issue [Feature] add new dbt.deps type: url to internally hosted tarball #4205
resolves #
Proposed solution for feature request 4205
Description
Enable direct linking to tarball urls in packages.yml
, for example:
# packages.yml
packages:
- tarball: https://codeload.github.com/dbt-labs/dbt-utils/tar.gz/0.6.6
- sha1: 549db5560efccbadc7ec7834c03cf13579199a66
# sha1 is optional. if specified, will only install package if downloaded tar matches.
- subdirectory: dbt-utils-0.6.6
# subdirectory is optional. By default package root will be based on single the parent folder in tarfile. Subdirectory arg here can be used to override (based on subdirectory arg in git package).
Use case is for dbt projects that need to host internal private packages - without relying on the git package install option. (public tarball above used here as example only). This is applicable when a) internal packages are being build for non-public use b) cloning packages from git is not supported (or discouraged) by existing deployment systems c) internal file hosting service (such as internal artifactory service or internal cloud storage buckets) can be easily configured to host packages for install during deployment.
Sketching out doc changes here: https://github.com/timle2/docs.getdbt.com/blob/dbt-docs-tarball-package-updates/website/docs/docs/building-a-dbt-project/package-management.md#tar-files
Checklist
- [x] I have signed the CLA
- [x] I have run this code in development and it appears to resolve the stated issue
- [x] This PR includes tests, or tests are not required/relevant for this PR
- [TODO] I have updated the
CHANGELOG.md
and added information about my change
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:
- check if your git client is configured with an email to sign commits
git config --list | grep email
- If not, set it up using
git config --global user.email [email protected]
- Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:
- check if your git client is configured with an email to sign commits
git config --list | grep email
- If not, set it up using
git config --global user.email [email protected]
- Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails
@cla-bot check
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:
- check if your git client is configured with an email to sign commits
git config --list | grep email
- If not, set it up using
git config --global user.email [email protected]
- Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails
The cla-bot has been summoned, and re-checked this pull request!
@timle2 thanks for opening up the PR with your new username! I've double checked and you did indeed sign the CLA. We have your username listed in the correct format. Looks like you may need to configure with your email to get our bot to process it.
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:
- check if your git client is configured with an email to sign commits
git config --list | grep email
- If not, set it up using
git config --global user.email [email protected]
- Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:
- check if your git client is configured with an email to sign commits
git config --list | grep email
- If not, set it up using
git config --global user.email [email protected]
- Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:
- check if your git client is configured with an email to sign commits
git config --list | grep email
- If not, set it up using
git config --global user.email [email protected]
- Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:
- check if your git client is configured with an email to sign commits
git config --list | grep email
- If not, set it up using
git config --global user.email [email protected]
- Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:
- check if your git client is configured with an email to sign commits
git config --list | grep email
- If not, set it up using
git config --global user.email [email protected]
- Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails
Issues referenced no longer apply to revision 3 - leaving for history
TODO: #4220 (comment)
We've updated how to we log to use the Events modules. You'll need to create a new log event for this, and everywhere else you log, and use
fire_event
to kick off writing the log. We have aREADME
with some good info on our logging system.
✅ Switched all the logging over to the Events / fire_event.
TODO #4220 (comment)
As a standard we don't use
assert
in our code other than for tests. They shadow the actual exceptions and throw anAssertionError
instead. Please rewrite the assert statements to catch relevant exceptions (create new ones if needed!). In this specific case, I do like the useis_tarfile
to check the file! Can you think of a reason it would be worth retrying the download if the file is not a valid tarfile?
✅ Moved over to properly defined exceptions. For the most part leveraging dbt.exceptions.DependencyException.
Can you think of a reason it would be worth retrying the download if the file is not a valid tarfile?
No I think straight failure without retry makes the most sense. In almost all cases the user would want to know that the tarfile is invalid, with dbt deps
failing, as opposed to hitting the server multiple times to confirm. A transmission error could result in a corrupted tar file, however this is such an edge case I'm not sure it's worth guarding against.
Can you think of a reason it would be worth retrying the download if the file is not a valid tarfile?
No I think straight failure without retry makes the most sense. In almost all cases the user would want to know that the tarfile is invalid, withdbt deps
failing, as opposed to hitting the server multiple times to confirm. A transmission error could result in a corrupted tar file, however this is such an edge case I'm not sure it's worth guarding against.
I agree that this is rare, but it does happen. The user will still find out the tar file is invalid if that is indeed the issue and not a transmission error we could recover from. This will be rare but worth protecting against. In dbt Cloud deps are installed every time a job runs so we want to ensure if there is a transmission error we can recover from with a simple retry, we do so as not to fail an entire job.
Unrelated: You'll want to pull in the latest on the main branch as we've added some auto formatting/checking and that is why you have a failing test. You can read more about it here.
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:
- check if your git client is configured with an email to sign commits
git config --list | grep email
- If not, set it up using
git config --global user.email [email protected]
- Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:
- check if your git client is configured with an email to sign commits
git config --list | grep email
- If not, set it up using
git config --global user.email [email protected]
- Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Tim Leonard. This is most likely caused by a git client misconfiguration; please make sure to:
- check if your git client is configured with an email to sign commits
git config --list | grep email
- If not, set it up using
git config --global user.email [email protected]
- Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails
Is this something that you're still interested in working on and getting over the finish line?
This would be quite useful. Do you still plan to finish this?
Yes this is something my team would benefit from as well, and I'm glad others see the value in an additional install method for deps. I haven't had time to get this past the finish line unfortunately. I welcome any collaborators/input/help whatever it takes to get this approved though! At minimum, I'll be able to finish this in Sept, but obviously if others are interested I'd love to get this done sooner.
@timle2 it's great that you're still interested in getting this over the finish line, even if it has to wait until September. Thanks so much! I did want to drop in to say that we have done quite a bit of work around dbt deps since you worked on this issue last. It would be helpful to peruse the changes we've made to (including adding caching!) to ensure your approach matches the pattern we're now using with other methods.
@emmyoop Based on feedback from revision 1 and 2, I've stripped this PR way down, to the abosolute minumum required to support the tarball feature.
I've removed features - nice but not required - such as supporting subfolders like git packages, and sha1 checks.
Now we lean on existing code from RegistryPinnedPackage
(install()
and download_and_untar()
). This slims down the PR down to only the most essential code additions :)
~However, RegistryPinnedPackage
install()
and download_and_untar()
are not currently set up as functions that can be used across classes, and I'm not smart enought to set up some crazy class inheritance structure to get it done. So I've copy/pasted the fns from RegistryPinnedPackage
to TarballPinnedPackage
. I'd advocate for moving these fns to dbt.deps.base, or to a dbt.deps.common file, but I also don't feel that I'm the person to make that call - need some feedback from dbt maintainers here on how to move forward based dbt labs thoughts.~
I've moved RegistryPinnedPackage
install()
and download_and_untar()
to the parent PinnedPackage
class so the fns can be reused across RegistryPinnedPackage
and TarballPinnedPackage
. I'm not sure about the naming convention the team supports for this design though (I did install()
-> _install()
on the parent class), so please lmk on that.
Thank you @the-chris-mitchell for sharing support on this. I've picked up
@timle2 it's great that you're still interested in getting this over the finish line, even if it has to wait until September. Thanks so much! I did want to drop in to say that we have done quite a bit of work around dbt deps since you worked on this issue last. It would be helpful to peruse the changes we've made to (including adding caching!) to ensure your approach matches the pattern we're now using with other methods.
done and done! totally rewrote from main branch :)
Thank you @the-chris-mitchell for sharing support on this. I've picked it back up and hoping to get across finish line in the next few weeks.
Would love to see this released! :) (Also see my comment here: https://github.com/dbt-labs/dbt-core/issues/4205#issuecomment-1293401553)
@emmyoop something funny going on with GitHub actions today? Last git workflow run failed with the following:
I've done an update from master branch in case there was anything committed in that would fix this, but doesn't look like it at quick glance.
@timle2 I think branches that start with feature/
have a special meaning for us.
Can you rename it to something else (like timle2/dbt-deps-tarball
) and see if it resolves the issue you are seeing with pre-commit
?
@timle2 I think branches that start with
feature/
have a special meaning for us.Can you rename it to something else (like
timle2/dbt-deps-tarball
) and see if it resolves the issue you are seeing withpre-commit
?
Thanks for the help here. I've got a second PR with the altered branch name here, see if that works. https://github.com/dbt-labs/dbt-core/pull/6302 No way for me to change the branch name here without closing the PR it seems.
@timle2 I think branches that start with
feature/
have a special meaning for us. Can you rename it to something else (liketimle2/dbt-deps-tarball
) and see if it resolves the issue you are seeing withpre-commit
?Thanks for the help here. I've got a second PR with the altered branch name here, see if that works. #6302 No way for me to change the branch name here without closing the PR it seems.
Actually this PR working fine now, yay! Back to getting flake errors like I'm used to 😆
@timle2 Something for you to do:
- Add this as the first line of your PR description:
-
resolves #4205
- The "resolves" keyword will link the PR to the issue and will automatically close the latter at the appropriate time.
- Having it as the first line is just a convention we used and not strictly required (but appreciated!)
-
@timle2 this looks great! 🎉 Just had a few small comments.
I wanted to give you some feedback now since I know it was overdue! Before I officially approve and merge this in I wanted to do some manual testing on it. I'll let you know if I find anything testing and if not I'll get this merged in once you finish a bit of cleanup.
🥳 Thanks @emmyoop! Thanks for the support through this PR, greatly appreciated. I've addressed all the comments, but please let me know if I've missed anything.
Flake/Black passing on my local machine, so this should be good to re-run workflows for another round of validation.
I'll do some manual testing on this and report back here afterwards.