feat: Preset.io source (Superset SaaS)
Preset.io is a commercial SaaS provider for Apache Superset. This PR introduces a new ingestion source for Preset "workspaces", which are essentially Superset instances. The only difference is the authentication mechanism that needs to be done against an API gateway using a key/secret pair.
I implemented the source as a derived class of SupersetSource in the same file; let me know if it would be better to have it in a separate file.
I added a new integration test, but I haven't tested it end-to-end yet. I'm in the process of setting up a local development to test it with a real workspace. But I thought it would be good to get early feedback in the PR since this is my first time contributing to DataHub.
Checklist
- [X] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
- [X] Links to related issues (if applicable)
- [X] Tests for the changes have been added/updated (if applicable)
- [X] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
- [X] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub
Deployment failed with the following error:
The provided GitHub repository does not contain the requested branch or commit reference. Please ensure the repository is not empty.
@mayurinehate could you take a look at this PR?
@mayurinehate could you take a look at this PR?
Let me know if you need help testing end-to-end, I can get you a Preset workspace to run tests. Thank you!
Thanks for the review, @mayurinehate! I'll address the comments in the next couple days!
Hello @betodealmeida, would you have any idea when you might have time to work on it? thanks a lot
@betodealmeida @mayurinehate, @hsheth2 is there still work to do on this? We just ran up against this requirement and would be willing to do the last mile work to get this in.
I think i have a rebased version of this more or less sussed out. I'm hitting a snag testing it out in the local env. the ingest job fails to pull install the preset subpackage becuase its calling to the acryl pip package. It feels like I missed a step or im operating off of old docs. I can push that PR once I get the functionality ready. I've incorporated the suggestions on this PR.
The class creation more or less mirrors this PR. I did also add the logo and all the registration stuff in the typescript files.
I am using the quickstartDebug method for gradle. on the first connection run im getting this error:
+ pip install 'acryl-datahub[datahub-rest,datahub-kafka,preset]==c6a1571'
ERROR: Ignored the following yanked versions: 0.8.18, 0.8.29.1, 0.8.33.2, 0.8.34, 0.8.44rc0, 0.8.44rc1, 0.8.44.3rc4, 0.9.4, 0.10.5.3
ERROR: Ignored the following versions that require a different python version: 0.8.24.1 Requires-Python >=3.6, <=3.9.9; 0.8.24.2 Requires-Python >=3.6, <=3.9.9; 0.8.24.3 Requires-Python >=3.6, <=3.9.9; 0.8.25 Requires-Python >=3.6, <=3.9.9; 0.8.25.0 Requires-Python >=3.6, <=3.9.9; 0.8.25.1 Requires-Python >=3.6, <=3.9.9; 0.8.25.2 Requires-Python >=3.6, <=3.9.9; 0.8.26.0 Requires-Python >=3.6, <=3.9.9; 0.8.26.1 Requires-Python >=3.6, <=3.9.9; 0.8.26.2 Requires-Python >=3.6, <=3.9.9; 0.8.26.3 Requires-Python >=3.6, <=3.9.9; 0.8.26.4 Requires-Python >=3.6, <=3.9.9; 0.8.26.5 Requires-Python >=3.6, <=3.9.9; 0.8.26.6 Requires-Python >=3.6, <=3.9.9; 0.8.26.7 Requires-Python >=3.6, <=3.9.9; 0.8.26.7rc1 Requires-Python >=3.6, <=3.9.9; 0.8.26.7rc2 Requires-Python >=3.6, <=3.9.9; 0.8.26.8 Requires-Python >=3.6, <=3.9.9; 0.8.26.8rc1 Requires-Python >=3.6, <=3.9.9; 0.8.27 Requires-Python >=3.6, <=3.9.9; 0.8.27.1 Requires-Python >=3.6, <=3.9.9; 0.8.27.1rc1 Requires-Python >=3.6, <=3.9.9; 0.8.27.2 Requires-Python >=3.6, <=3.9.9; 0.8.27.2rc1 Requires-Python >=3.6, <=3.9.9; 0.8.27.2rc2 Requires-Python >=3.6, <=3.9.9; 0.8.27.2rc3 Requires-Python >=3.6, <=3.9.9
ERROR: Could not find a version that satisfies the requirement acryl-datahub==c6a1571 (from versions: 0.0.2, 0.0.3, 0.0.4, 0.0.5, 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.2.0, 0.2.1, 0.2.2, 0.2.3, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.4.0, 0.8.1.0, 0.8.1.1, 0.8.1.2, 0.8.3.0, 0.8.3.1, 0.8.3.2, 0.8.3.3, 0.8.4.0, 0.8.5.0, 0.8.5.1, 0.8.5.2, 0.8.6.0, 0.8.6.1, 0.8.6.2, 0.8.6.3, 0.8.6.4, 0.8.6.5, 0.8.7.0, 0.8.8.0, 0.8.8.1, 0.8.8.2, 0.8.8.3, 0.8.8.4, 0.8.9.0, 0.8.10.0, 0.8.10.1, 0.8.10.2, 0.8.11.0, 0.8.11.1, 0.8.12.0, 0.8.13.0, 0.8.13.1, 0.8.14.0, 0.8.14.1, 0.8.14.2, 0.8.15.0, 0.8.15.1, 0.8.15.2, 0.8.15.3, 0.8.15.4, 0.8.15.5, 0.8.15.6, 0.8.15.7, 0.8.15.8, 0.8.15.9, 0.8.15.10, 0.8.16.0, 0.8.16.1, 0.8.16.2, 0.8.16.3, 0.8.16.4, 0.8.16.5, 0.8.16.6, 0.8.16.7, 0.8.16.8, 0.8.16.9, 0.8.16.11, 0.8.16.12, 0.8.17.0, 0.8.17.1, 0.8.17.2, 0.8.17.3, 0.8.17.4, 0.8.17.5, 0.8.17.6, 0.8.17.7, 0.8.18.1, 0.8.19.0, 0.8.19.1, 0.8.20.0, 0.8.21.0, 0.8.22.1, 0.8.23.0, 0.8.23.1, 0.8.24.0, 0.8.28.0rc1, 0.8.28.0, 0.8.28.1, 0.8.29, 0.8.29.2, 0.8.30.0, 0.8.31, 0.8.31.1rc1, 0.8.31.1, 0.8.31.2, 0.8.31.3rc1, 0.8.31.3, 0.8.31.4rc1, 0.8.31.4, 0.8.31.5rc1, 0.8.31.5, 0.8.31.6rc1, 0.8.31.6rc2, 0.8.31.6, 0.8.32rc1, 0.8.32rc2, 0.8.32rc3, 0.8.32rc4, 0.8.32, 0.8.32.1, 0.8.32.2rc1, 0.8.32.2, 0.8.32.3rc1, 0.8.32.3, 0.8.32.4rc1, 0.8.32.4rc2, 0.8.32.4, 0.8.32.5rc1, 0.8.32.5, 0.8.32.6rc2, 0.8.32.6rc3, 0.8.32.6, 0.8.32.7, 0.8.33rc1, 0.8.33, 0.8.33.1, 0.8.33.2rc1, 0.8.33.2rc2, 0.8.33.3rc2, 0.8.33.3rc3, 0.8.33.3, 0.8.34.1rc1, 0.8.34.1rc2, 0.8.34.1rc3, 0.8.34.1, 0.8.34.2rc1, 0.8.34.2rc2, 0.8.34.2rc3, 0.8.34.2rc4, 0.8.34.2, 0.8.34.3rc1, 0.8.35.0rc2, 0.8.35, 0.8.35.1rc1, 0.8.35.1, 0.8.35.2rc1, 0.8.35.2, 0.8.35.3rc1, 0.8.35.3, 0.8.35.4rc1, 0.8.35.4, 0.8.35.5rc1, 0.8.35.5, 0.8.35.6rc1, 0.8.35.6rc2, 0.8.35.6, 0.8.35.7rc1, 0.8.35.7, 0.8.35.8rc1, 0.8.35.8rc2, 0.8.35.8rc3, 0.8.36.0rc0, 0.8.36rc1, 0.8.36, 0.8.36.1rc1, 0.8.36.1rc2, 0.8.36.1rc6, 0.8.36.1rc7, 0.8.36.1rc8, 0.8.36.1rc9, 0.8.36.1rc10, 0.8.37rc0, 0.8.37, 0.8.38, 0.8.38.1rc0, 0.8.38.1rc1, 0.8.38.1, 0.8.38.2rc1, 0.8.38.2, 0.8.38.3rc1, 0.8.38.3, 0.8.38.4rc0, 0.8.38.4rc2, 0.8.38.4rc3, 0.8.38.4, 0.8.38.5rc0, 0.8.38.5, 0.8.39rc0, 0.8.39, 0.8.39.1rc1, 0.8.39.1rc2, 0.8.39.1rc3, 0.8.39.1rc4, 0.8.39.1rc5, 0.8.39.1rc6, 0.8.39.1rc7, 0.8.39.1rc8, 0.8.40rc1, 0.8.40, 0.8.40.1, 0.8.40.2rc0, 0.8.40.2, 0.8.40.3rc0, 0.8.40.3rc1, 0.8.40.3rc2, 0.8.40.3rc3, 0.8.40.3, 0.8.40.4rc1, 0.8.40.4rc2, 0.8.41rc2, 0.8.41, 0.8.41.1rc0, 0.8.41.1rc1, 0.8.41.1rc2, 0.8.41.1rc3, 0.8.41.1rc4, 0.8.41.1, 0.8.41.2rc0, 0.8.41.2rc1, 0.8.41.2, 0.8.41.3rc1, 0.8.41.3rc2, 0.8.41.3rc3, 0.8.42rc1, 0.8.42rc2, 0.8.42, 0.8.43rc2, 0.8.43rc3, 0.8.43rc4, 0.8.43, 0.8.43.1rc0, 0.8.43.1rc1, 0.8.43.1, 0.8.43.2rc0, 0.8.43.2rc1, 0.8.43.2, 0.8.43.3rc0, 0.8.43.3rc1, 0.8.43.3rc2, 0.8.43.3rc3, 0.8.43.3rc5, 0.8.43.3, 0.8.43.4rc1, 0.8.43.4rc2, 0.8.43.4, 0.8.43.5rc1, 0.8.43.5rc2, 0.8.43.5rc3, 0.8.43.5, 0.8.43.6rc0, 0.8.43.6rc1, 0.8.43.6, 0.8.44rc3, 0.8.44rc4, 0.8.44rc5, 0.8.44, 0.8.44.1rc0, 0.8.44.1rc1, 0.8.44.1rc2, 0.8.44.1rc3, 0.8.44.1rc4, 0.8.44.1, 0.8.44.2rc0, 0.8.44.2rc1, 0.8.44.2rc2, 0.8.44.2, 0.8.44.3rc0, 0.8.44.3rc1, 0.8.44.3rc2, 0.8.44.3rc3, 0.8.44.3, 0.8.44.4rc0, 0.8.44.4rc1, 0.8.44.4, 0.8.44.5rc0, 0.8.44.5rc1, 0.8.44.5rc2, 0.8.44.5rc3, 0.8.44.5, 0.8.44.6rc0, 0.8.45rc1, 0.8.45, 0.8.45.1rc0, 0.8.45.1rc2, 0.8.45.1rc3, 0.8.45.1rc4, 0.8.45.1rc5, 0.8.45.1, 0.8.45.2rc0, 0.8.45.2rc1, 0.8.45.2rc2, 0.8.45.2, 0.8.45.3rc0, 0.8.45.3rc1, 0.8.45.3rc2, 0.8.45.3rc3, 0.8.45.3rc4, 0.8.45.3rc5, 0.9.0rc4, 0.9.0rc5, 0.9.0rc6, 0.9.0, 0.9.0.1rc0, 0.9.0.1, 0.9.0.2rc0, 0.9.0.2rc1, 0.9.0.2rc2, 0.9.0.2rc3, 0.9.0.2rc4, 0.9.0.2, 0.9.0.3rc0, 0.9.0.3, 0.9.0.4rc0, 0.9.0.4, 0.9.0.5rc0, 0.9.0.5rc1, 0.9.0.5rc2, 0.9.0.5, 0.9.1rc0, 0.9.1, 0.9.1.1rc0, 0.9.1.1rc1, 0.9.1.1rc2, 0.9.2, 0.9.2.1rc0, 0.9.2.1rc1, 0.9.2.1rc2, 0.9.2.1, 0.9.2.2rc0, 0.9.2.2rc1, 0.9.2.2rc2, 0.9.2.2rc3, 0.9.2.2, 0.9.2.3rc1, 0.9.2.3rc2, 0.9.2.3rc3, 0.9.2.3rc4, 0.9.2.3, 0.9.2.4rc1, 0.9.2.4rc2, 0.9.2.4, 0.9.2.5rc1, 0.9.2.5rc3, 0.9.2.5rc4, 0.9.2.5rc5, 0.9.2.5rc8, 0.9.2.5, 0.9.3, 0.9.3.1rc0, 0.9.3.1rc1, 0.9.3.1, 0.9.3.2rc0, 0.9.3.2rc2, 0.9.3.2rc3, 0.9.3.2, 0.9.3.3rc0, 0.9 [...truncated]
ERROR: No matching distribution found for acryl-datahub==c6a1571
I'm guessing there's maybe an env var or something im supposed to set that overrides this?
subsequent runs faile with:
[2024-06-15 15:29:57,523] ERROR {datahub.entrypoints:201} - Command failed: Failed to find a registered source for type preset: 'Did not find a registered class for preset'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 120, in _add_init_error_context
yield
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 223, in __init__
source_class = source_registry.get(source_type)
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 172, in get
raise KeyError(f"Did not find a registered class for {key}")
KeyError: 'Did not find a registered class for preset'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/datahub/entrypoints.py", line 188, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 448, in wrapper
raise e
File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 397, in wrapper
res = func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
return func(ctx, *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 197, in run
ret = loop.run_until_complete(run_ingestion_and_check_upgrade())
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 167, in run_ingestion_and_check_upgrade
pipeline = Pipeline.create(
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 336, in create
return cls(
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 220, in __init__
with _add_init_error_context(
File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 122, in _add_init_error_context
raise PipelineInitError(f"Failed to {step}: {e}") from e
datahub.ingestion.run.pipeline.PipelineInitError: Failed to find a registered source for type preset: 'Did not find a registered class for preset'
@byndcivilization when developing on sources, you'll need to use ingestion from the CLI e.g. datahub ingest -c <recipe.yml> from within the metadata-ingestion venv (see https://datahubproject.io/docs/metadata-ingestion/developing/#set-up-your-python-environment)
@Coderabbitai full review