[CT-2991] [SPIKE] Investigate minimal-snowplow-tracker
Housekeeping
- [X] I am a maintainer of dbt-core
Short description
minimal-snowplow-tracker is used in dbt-core for anonymous usage tracking. It is an internally managed fork of snowplow-python-tracker. Currently it is 100+ commits behind and only officially supports up to Python 3.7.
Acceptance criteria
Explore if there are performance hits for switching from our fork (minimal-snowplow-tracker) to the Snowplow maintained project (snowplow-python-tracker).
Impact to Adapters
None
Context
If there is a performance hit from using the full package maintained by Snowplow, we should look into rolling this code into the code codebase. Will need to verify if any other internal projects are depending on this project.
The minimal-snowplow-tracker project is considered a critical project by PyPI meaning it's in the top 1% of downloads on PyPI. I'm guessing that's because it's a dependency of core which is also a critical project. When we remove the project we should not remove it from PyPI for 6+ months to ensure we don't accidentally break other projects.
I bumped into a issue with minimal showplow after installing meltano using this guide: https://dagster.io/blog/dagster-meltano-integration-tutorial
The guide is a little bit inaccurate because it does not say that you need to install meltano package into venv the dagster is running in. And after installing meltano (and doing nothing with the code) I ran into
Stack Trace:
File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster/_grpc/server.py", line 295, in __init__
self._loaded_repositories: Optional[LoadedRepositories] = LoadedRepositories(
File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster/_grpc/server.py", line 139, in __init__
loadable_targets = get_loadable_targets(
File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster/_grpc/utils.py", line 47, in get_loadable_targets
else loadable_targets_from_python_module(module_name, working_directory)
File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster/_core/workspace/autodiscovery.py", line 35, in loadable_targets_from_python_module
module = load_python_module(
File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster/_core/code_pointer.py", line 135, in load_python_module
return importlib.import_module(module_name)
File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/Users/alik/Development/dagster_analytics/dagster_analytics/definitions.py", line 4, in <module>
from dagster_dbt import DbtCliResource
File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster_dbt/__init__.py", line 2, in <module>
from .asset_defs import (
File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster_dbt/asset_defs.py", line 62, in <module>
from dagster_dbt.core.resources import DbtCliClient
File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster_dbt/core/__init__.py", line 5, in <module>
from .resources_v2 import (
File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster_dbt/core/resources_v2.py", line 35, in <module>
from dbt.contracts.results import NodeStatus, TestStatus
File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dbt/contracts/results.py", line 1, in <module>
from dbt.contracts.graph.unparsed import FreshnessThreshold
File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dbt/contracts/graph/unparsed.py", line 3, in <module>
from dbt import deprecations
File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dbt/deprecations.py", line 6, in <module>
import dbt.tracking
File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dbt/tracking.py", line 104, in <module>
emitter = TimeoutEmitter()
File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dbt/tracking.py", line 54, in __init__
super().__init__(
This happens because meltano depends on snowplow-tracker [required: >=1.0.1,<2.0.0, installed: 1.0.1]. And the contract for snowplow_tracker.Emitter does not match between snowplow-tracker and minimal-snowplow-tracker.
I propose to update contract of minimal-snowplow-tracker to match snowplow-tracker. I think I can do it, but I need the sources. Can you please make it public?
I believe it is public: https://github.com/dbt-labs/snowplow-python-tracker
Do you happen to have any news here?
The issue with Meltano persists, I created an issue in their Github: https://github.com/meltano/meltano/issues/8256
Should I invest my time to work around it in Meltano, or do you plan to solve it here in the near future?
We don't have a timeline for working on this ticket yet @jaceksan
Edgar created a pull request which is already under review and IMO it relates to this issue (if it even does not fix it?):
https://github.com/dbt-labs/dbt-core/pull/9528
Why not just release a version 0.1.0 of minimal-snowplow-tracker, and install it to site-packages/minimal_snowplow_tracker, instead of clobbering the real site-packages/snowplow_tracker? The current way breaks every other package that uses the upstream snowplow tracker, which you forked and no longer (needed for your own purposes to) maintain.
Why not just release a version 0.1.0 of minimal-snowplow-tracker, and install it to
site-packages/minimal_snowplow_tracker, instead of clobbering the realsite-packages/snowplow_tracker? The current way is breaks every other package that uses the upstream snowplow tracker, which you forked and no longer (needed for your own purposes to) maintain.
That'd be ideal, but it'd still require changes in dbt to update the imports.
At least in dbt-core, this is contained to two lines in one file (a simpler fix than your https://github.com/dbt-labs/dbt-core/pull/9680). Is there telemetry in other massively used dbt packages? Personally I only use dbt-postgres and dbt-bigquery, which don't seem to have a snowplow dep, similarly it seems dbt-utils doesn't.