datahub icon indicating copy to clipboard operation
datahub copied to clipboard

feat: Preset.io source (Superset SaaS)

Open betodealmeida opened this issue 2 years ago • 10 comments

Preset.io is a commercial SaaS provider for Apache Superset. This PR introduces a new ingestion source for Preset "workspaces", which are essentially Superset instances. The only difference is the authentication mechanism that needs to be done against an API gateway using a key/secret pair.

I implemented the source as a derived class of SupersetSource in the same file; let me know if it would be better to have it in a separate file.

I added a new integration test, but I haven't tested it end-to-end yet. I'm in the process of setting up a local development to test it with a real workspace. But I thought it would be good to get early feedback in the PR since this is my first time contributing to DataHub.

Checklist

  • [X] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • [X] Links to related issues (if applicable)
  • [X] Tests for the changes have been added/updated (if applicable)
  • [X] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • [X] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

betodealmeida avatar Jun 07 '23 17:06 betodealmeida

Deployment failed with the following error:

The provided GitHub repository does not contain the requested branch or commit reference. Please ensure the repository is not empty.

vercel[bot] avatar Jun 07 '23 17:06 vercel[bot]

@mayurinehate could you take a look at this PR?

hsheth2 avatar Jul 05 '23 23:07 hsheth2

@mayurinehate could you take a look at this PR?

Let me know if you need help testing end-to-end, I can get you a Preset workspace to run tests. Thank you!

betodealmeida avatar Jul 07 '23 02:07 betodealmeida

Thanks for the review, @mayurinehate! I'll address the comments in the next couple days!

betodealmeida avatar Jul 17 '23 16:07 betodealmeida

Hello @betodealmeida, would you have any idea when you might have time to work on it? thanks a lot

kuhnen avatar Aug 06 '23 09:08 kuhnen

@betodealmeida @mayurinehate, @hsheth2 is there still work to do on this? We just ran up against this requirement and would be willing to do the last mile work to get this in.

byndcivilization avatar Jun 13 '24 16:06 byndcivilization

I think i have a rebased version of this more or less sussed out. I'm hitting a snag testing it out in the local env. the ingest job fails to pull install the preset subpackage becuase its calling to the acryl pip package. It feels like I missed a step or im operating off of old docs. I can push that PR once I get the functionality ready. I've incorporated the suggestions on this PR.

byndcivilization avatar Jun 15 '24 05:06 byndcivilization

The class creation more or less mirrors this PR. I did also add the logo and all the registration stuff in the typescript files.

I am using the quickstartDebug method for gradle. on the first connection run im getting this error:

+ pip install 'acryl-datahub[datahub-rest,datahub-kafka,preset]==c6a1571'
ERROR: Ignored the following yanked versions: 0.8.18, 0.8.29.1, 0.8.33.2, 0.8.34, 0.8.44rc0, 0.8.44rc1, 0.8.44.3rc4, 0.9.4, 0.10.5.3
ERROR: Ignored the following versions that require a different python version: 0.8.24.1 Requires-Python >=3.6, <=3.9.9; 0.8.24.2 Requires-Python >=3.6, <=3.9.9; 0.8.24.3 Requires-Python >=3.6, <=3.9.9; 0.8.25 Requires-Python >=3.6, <=3.9.9; 0.8.25.0 Requires-Python >=3.6, <=3.9.9; 0.8.25.1 Requires-Python >=3.6, <=3.9.9; 0.8.25.2 Requires-Python >=3.6, <=3.9.9; 0.8.26.0 Requires-Python >=3.6, <=3.9.9; 0.8.26.1 Requires-Python >=3.6, <=3.9.9; 0.8.26.2 Requires-Python >=3.6, <=3.9.9; 0.8.26.3 Requires-Python >=3.6, <=3.9.9; 0.8.26.4 Requires-Python >=3.6, <=3.9.9; 0.8.26.5 Requires-Python >=3.6, <=3.9.9; 0.8.26.6 Requires-Python >=3.6, <=3.9.9; 0.8.26.7 Requires-Python >=3.6, <=3.9.9; 0.8.26.7rc1 Requires-Python >=3.6, <=3.9.9; 0.8.26.7rc2 Requires-Python >=3.6, <=3.9.9; 0.8.26.8 Requires-Python >=3.6, <=3.9.9; 0.8.26.8rc1 Requires-Python >=3.6, <=3.9.9; 0.8.27 Requires-Python >=3.6, <=3.9.9; 0.8.27.1 Requires-Python >=3.6, <=3.9.9; 0.8.27.1rc1 Requires-Python >=3.6, <=3.9.9; 0.8.27.2 Requires-Python >=3.6, <=3.9.9; 0.8.27.2rc1 Requires-Python >=3.6, <=3.9.9; 0.8.27.2rc2 Requires-Python >=3.6, <=3.9.9; 0.8.27.2rc3 Requires-Python >=3.6, <=3.9.9
ERROR: Could not find a version that satisfies the requirement acryl-datahub==c6a1571 (from versions: 0.0.2, 0.0.3, 0.0.4, 0.0.5, 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.2.0, 0.2.1, 0.2.2, 0.2.3, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.4.0, 0.8.1.0, 0.8.1.1, 0.8.1.2, 0.8.3.0, 0.8.3.1, 0.8.3.2, 0.8.3.3, 0.8.4.0, 0.8.5.0, 0.8.5.1, 0.8.5.2, 0.8.6.0, 0.8.6.1, 0.8.6.2, 0.8.6.3, 0.8.6.4, 0.8.6.5, 0.8.7.0, 0.8.8.0, 0.8.8.1, 0.8.8.2, 0.8.8.3, 0.8.8.4, 0.8.9.0, 0.8.10.0, 0.8.10.1, 0.8.10.2, 0.8.11.0, 0.8.11.1, 0.8.12.0, 0.8.13.0, 0.8.13.1, 0.8.14.0, 0.8.14.1, 0.8.14.2, 0.8.15.0, 0.8.15.1, 0.8.15.2, 0.8.15.3, 0.8.15.4, 0.8.15.5, 0.8.15.6, 0.8.15.7, 0.8.15.8, 0.8.15.9, 0.8.15.10, 0.8.16.0, 0.8.16.1, 0.8.16.2, 0.8.16.3, 0.8.16.4, 0.8.16.5, 0.8.16.6, 0.8.16.7, 0.8.16.8, 0.8.16.9, 0.8.16.11, 0.8.16.12, 0.8.17.0, 0.8.17.1, 0.8.17.2, 0.8.17.3, 0.8.17.4, 0.8.17.5, 0.8.17.6, 0.8.17.7, 0.8.18.1, 0.8.19.0, 0.8.19.1, 0.8.20.0, 0.8.21.0, 0.8.22.1, 0.8.23.0, 0.8.23.1, 0.8.24.0, 0.8.28.0rc1, 0.8.28.0, 0.8.28.1, 0.8.29, 0.8.29.2, 0.8.30.0, 0.8.31, 0.8.31.1rc1, 0.8.31.1, 0.8.31.2, 0.8.31.3rc1, 0.8.31.3, 0.8.31.4rc1, 0.8.31.4, 0.8.31.5rc1, 0.8.31.5, 0.8.31.6rc1, 0.8.31.6rc2, 0.8.31.6, 0.8.32rc1, 0.8.32rc2, 0.8.32rc3, 0.8.32rc4, 0.8.32, 0.8.32.1, 0.8.32.2rc1, 0.8.32.2, 0.8.32.3rc1, 0.8.32.3, 0.8.32.4rc1, 0.8.32.4rc2, 0.8.32.4, 0.8.32.5rc1, 0.8.32.5, 0.8.32.6rc2, 0.8.32.6rc3, 0.8.32.6, 0.8.32.7, 0.8.33rc1, 0.8.33, 0.8.33.1, 0.8.33.2rc1, 0.8.33.2rc2, 0.8.33.3rc2, 0.8.33.3rc3, 0.8.33.3, 0.8.34.1rc1, 0.8.34.1rc2, 0.8.34.1rc3, 0.8.34.1, 0.8.34.2rc1, 0.8.34.2rc2, 0.8.34.2rc3, 0.8.34.2rc4, 0.8.34.2, 0.8.34.3rc1, 0.8.35.0rc2, 0.8.35, 0.8.35.1rc1, 0.8.35.1, 0.8.35.2rc1, 0.8.35.2, 0.8.35.3rc1, 0.8.35.3, 0.8.35.4rc1, 0.8.35.4, 0.8.35.5rc1, 0.8.35.5, 0.8.35.6rc1, 0.8.35.6rc2, 0.8.35.6, 0.8.35.7rc1, 0.8.35.7, 0.8.35.8rc1, 0.8.35.8rc2, 0.8.35.8rc3, 0.8.36.0rc0, 0.8.36rc1, 0.8.36, 0.8.36.1rc1, 0.8.36.1rc2, 0.8.36.1rc6, 0.8.36.1rc7, 0.8.36.1rc8, 0.8.36.1rc9, 0.8.36.1rc10, 0.8.37rc0, 0.8.37, 0.8.38, 0.8.38.1rc0, 0.8.38.1rc1, 0.8.38.1, 0.8.38.2rc1, 0.8.38.2, 0.8.38.3rc1, 0.8.38.3, 0.8.38.4rc0, 0.8.38.4rc2, 0.8.38.4rc3, 0.8.38.4, 0.8.38.5rc0, 0.8.38.5, 0.8.39rc0, 0.8.39, 0.8.39.1rc1, 0.8.39.1rc2, 0.8.39.1rc3, 0.8.39.1rc4, 0.8.39.1rc5, 0.8.39.1rc6, 0.8.39.1rc7, 0.8.39.1rc8, 0.8.40rc1, 0.8.40, 0.8.40.1, 0.8.40.2rc0, 0.8.40.2, 0.8.40.3rc0, 0.8.40.3rc1, 0.8.40.3rc2, 0.8.40.3rc3, 0.8.40.3, 0.8.40.4rc1, 0.8.40.4rc2, 0.8.41rc2, 0.8.41, 0.8.41.1rc0, 0.8.41.1rc1, 0.8.41.1rc2, 0.8.41.1rc3, 0.8.41.1rc4, 0.8.41.1, 0.8.41.2rc0, 0.8.41.2rc1, 0.8.41.2, 0.8.41.3rc1, 0.8.41.3rc2, 0.8.41.3rc3, 0.8.42rc1, 0.8.42rc2, 0.8.42, 0.8.43rc2, 0.8.43rc3, 0.8.43rc4, 0.8.43, 0.8.43.1rc0, 0.8.43.1rc1, 0.8.43.1, 0.8.43.2rc0, 0.8.43.2rc1, 0.8.43.2, 0.8.43.3rc0, 0.8.43.3rc1, 0.8.43.3rc2, 0.8.43.3rc3, 0.8.43.3rc5, 0.8.43.3, 0.8.43.4rc1, 0.8.43.4rc2, 0.8.43.4, 0.8.43.5rc1, 0.8.43.5rc2, 0.8.43.5rc3, 0.8.43.5, 0.8.43.6rc0, 0.8.43.6rc1, 0.8.43.6, 0.8.44rc3, 0.8.44rc4, 0.8.44rc5, 0.8.44, 0.8.44.1rc0, 0.8.44.1rc1, 0.8.44.1rc2, 0.8.44.1rc3, 0.8.44.1rc4, 0.8.44.1, 0.8.44.2rc0, 0.8.44.2rc1, 0.8.44.2rc2, 0.8.44.2, 0.8.44.3rc0, 0.8.44.3rc1, 0.8.44.3rc2, 0.8.44.3rc3, 0.8.44.3, 0.8.44.4rc0, 0.8.44.4rc1, 0.8.44.4, 0.8.44.5rc0, 0.8.44.5rc1, 0.8.44.5rc2, 0.8.44.5rc3, 0.8.44.5, 0.8.44.6rc0, 0.8.45rc1, 0.8.45, 0.8.45.1rc0, 0.8.45.1rc2, 0.8.45.1rc3, 0.8.45.1rc4, 0.8.45.1rc5, 0.8.45.1, 0.8.45.2rc0, 0.8.45.2rc1, 0.8.45.2rc2, 0.8.45.2, 0.8.45.3rc0, 0.8.45.3rc1, 0.8.45.3rc2, 0.8.45.3rc3, 0.8.45.3rc4, 0.8.45.3rc5, 0.9.0rc4, 0.9.0rc5, 0.9.0rc6, 0.9.0, 0.9.0.1rc0, 0.9.0.1, 0.9.0.2rc0, 0.9.0.2rc1, 0.9.0.2rc2, 0.9.0.2rc3, 0.9.0.2rc4, 0.9.0.2, 0.9.0.3rc0, 0.9.0.3, 0.9.0.4rc0, 0.9.0.4, 0.9.0.5rc0, 0.9.0.5rc1, 0.9.0.5rc2, 0.9.0.5, 0.9.1rc0, 0.9.1, 0.9.1.1rc0, 0.9.1.1rc1, 0.9.1.1rc2, 0.9.2, 0.9.2.1rc0, 0.9.2.1rc1, 0.9.2.1rc2, 0.9.2.1, 0.9.2.2rc0, 0.9.2.2rc1, 0.9.2.2rc2, 0.9.2.2rc3, 0.9.2.2, 0.9.2.3rc1, 0.9.2.3rc2, 0.9.2.3rc3, 0.9.2.3rc4, 0.9.2.3, 0.9.2.4rc1, 0.9.2.4rc2, 0.9.2.4, 0.9.2.5rc1, 0.9.2.5rc3, 0.9.2.5rc4, 0.9.2.5rc5, 0.9.2.5rc8, 0.9.2.5, 0.9.3, 0.9.3.1rc0, 0.9.3.1rc1, 0.9.3.1, 0.9.3.2rc0, 0.9.3.2rc2, 0.9.3.2rc3, 0.9.3.2, 0.9.3.3rc0, 0.9 [...truncated]
ERROR: No matching distribution found for acryl-datahub==c6a1571

I'm guessing there's maybe an env var or something im supposed to set that overrides this?

subsequent runs faile with:

[2024-06-15 15:29:57,523] ERROR    {datahub.entrypoints:201} - Command failed: Failed to find a registered source for type preset: 'Did not find a registered class for preset'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 120, in _add_init_error_context
    yield
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 223, in __init__
    source_class = source_registry.get(source_type)
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 172, in get
    raise KeyError(f"Did not find a registered class for {key}")
KeyError: 'Did not find a registered class for preset'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/datahub/entrypoints.py", line 188, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 448, in wrapper
    raise e
  File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 397, in wrapper
    res = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
    return func(ctx, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 197, in run
    ret = loop.run_until_complete(run_ingestion_and_check_upgrade())
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 167, in run_ingestion_and_check_upgrade
    pipeline = Pipeline.create(
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 336, in create
    return cls(
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 220, in __init__
    with _add_init_error_context(
  File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 122, in _add_init_error_context
    raise PipelineInitError(f"Failed to {step}: {e}") from e
datahub.ingestion.run.pipeline.PipelineInitError: Failed to find a registered source for type preset: 'Did not find a registered class for preset'

byndcivilization avatar Jun 15 '24 15:06 byndcivilization

@byndcivilization when developing on sources, you'll need to use ingestion from the CLI e.g. datahub ingest -c <recipe.yml> from within the metadata-ingestion venv (see https://datahubproject.io/docs/metadata-ingestion/developing/#set-up-your-python-environment)

hsheth2 avatar Jun 21 '24 21:06 hsheth2

@Coderabbitai full review

shirshanka avatar Jun 28 '24 20:06 shirshanka