superset icon indicating copy to clipboard operation
superset copied to clipboard

feat(api): extending uuid field in dataset, chart, dasboard APIs

Open dankor opened this issue 1 year ago • 11 comments

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Extend API to handle uuid

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • [ ] Has associated issue:
  • [ ] Required feature flags:
  • [ ] Changes UI
  • [ ] Includes DB Migration (follow approval process in SIP-59)
    • [ ] Migration is atomic, supports rollback & is backwards-compatible
    • [ ] Confirm DB migration upgrade and downgrade tested
    • [ ] Runtime estimates and downtime expectations provided
  • [x] Introduces new feature or API
  • [ ] Removes existing feature or API

dankor avatar Jul 12 '24 10:07 dankor

Codecov Report

:x: Patch coverage is 95.23810% with 4 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 72.79%. Comparing base (2179081) to head (2e73419). :warning: Report is 18 commits behind head on master.

Files with missing lines Patch % Lines
superset/models/slice.py 57.14% 2 Missing and 1 partial :warning:
superset/utils/cache.py 80.00% 0 Missing and 1 partial :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #29573       +/-   ##
===========================================
+ Coverage        0   72.79%   +72.79%     
===========================================
  Files           0      574      +574     
  Lines           0    41769    +41769     
  Branches        0     4401     +4401     
===========================================
+ Hits            0    30407    +30407     
- Misses          0    10192    +10192     
- Partials        0     1170     +1170     
Flag Coverage Δ
hive 47.07% <73.80%> (?)
mysql 71.80% <95.23%> (?)
postgres 71.86% <95.23%> (?)
presto 50.77% <75.00%> (?)
python 72.76% <95.23%> (?)
sqlite 71.42% <95.23%> (?)
unit 100.00% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Jul 12 '24 16:07 codecov[bot]

Hi, @dankor! Welcome to the Superset GitHub!

Can you add some context on why these fields are needed? Thanks!

betodealmeida avatar Jul 12 '24 17:07 betodealmeida

Hi, @dankor! Welcome to the Superset GitHub!

Can you add some context on why these fields are needed? Thanks!

Hi @betodealmeida! Sure! I'm working on establishing a robust and granular deployment process for Superset. The core issue here is similar to why we expose these UUIDs in the import/export API: we need persistent identifiers across multiple Superset instances.

You might suggest extending the import/export API instead, but we’re encountering several challenges:

  • No support for tags, roles, etc.
  • Deprecated resources aren’t cleaned up.
  • Archives are much harder to debug compared to classic CRUD operations.

What I’m really aiming for is a declarative way to manage Superset resources (let’s call it a "report-as-code" approach). Imagine having a development environment where everyone can experiment freely. Once a report is ready, it can be committed to git, reviewed, and then deployed to higher environments.

Ideally, I envision mature version of this, but supporting all types of resources. You can check out a proof-of-concept in this video, though no code is available.

My plan is to implement this feature incrementally, starting with minor changes and aligning as closely as possible with upstream.

Hope this makes sense! Let me know if you're open to these kinds of contributions and what concerns the community might have.

dankor avatar Jul 13 '24 15:07 dankor

Hey all! I'm just wondering if there are any upcoming plans or next steps here. Could you please let me what to expect?

dankor avatar Jul 29 '24 07:07 dankor

It looks like CI is stuck because of some tests failing (which I'm re-running), and pre-commit checks (i.e. linting) so you will need to run the Pre-commit hooks

Superset uses Git pre-commit hooks courtesy of pre-commit. To install run the following:

pip3 install -r requirements/integration.txt
pre-commit install
A series of checks will now run when you make a git commit.

You can then run pre-commit manually:

pre-commit run --all-files

rusackas avatar Jul 29 '24 17:07 rusackas

Hi @rusackas ! Thanks! I fixed the linting.

@michael-s-molina Fair point. Actually, I have added this fields in order to q them:

{
  "columns": [
    "always_filter_main_dttm",
    "cache_timeout",
    "catalog",
    "columns.advanced_data_type",
    "columns.column_name",
    "columns.description",
    "columns.expression",
    "columns.extra",
    "columns.filterable",
    "columns.groupby",
    "columns.is_active",
    "columns.is_dttm",
    "columns.python_date_format",
    "columns.type",
    "columns.verbose_name",
    "default_endpoint",
    "description",
    "extra",
    "fetch_values_predicate",
    "filter_select_enabled",
    "is_managed_externally",
    "is_sqllab_view",
    "main_dttm_col",
    "metrics.d3format",
    "metrics.description",
    "metrics.expression",
    "metrics.extra",
    "metrics.metric_name",
    "metrics.metric_type",
    "metrics.verbose_name",
    "metrics.warning_text",
    "normalize_columns",
    "offset",
    "schema",
    "sql",
    "table_name",
    "template_params",
    "uuid",
    "database.uuid"
  ],
  "filters": [
    {
      "col": "uuid",
      "opr": "eq",
      "value": "7d2a123a-1952-46e1-b9d0-4d41e1e042c1"
    }
  ]
}

Otherwise these fields won't be listed with q.

Do you suppose to keep list_columns unchanged and perform two requests instead?

  • Run list_columns with uuid filter and get id.
  • Run show_select_columns against obtained id with all required fields.

I'd tend to have one request if possible, but I assume the front-end changes may be too big, right?

dankor avatar Jul 29 '24 19:07 dankor

I'd tend to have one request if possible, but I assume the front-end changes may be too big, right?

You would need to change the requests for all pages but I'm also worried that the default implementation of the endpoint will perform these joins (columns and metrics belong to separate tables).

@dpgaspar Is there a way we can achieve this without multiple requests?

michael-s-molina avatar Jul 29 '24 19:07 michael-s-molina

I'm wondering if we can come up with some action plan here. It currently seems quite frontend-oriented. I'd like to find a common ground in order to both fulfill my use-case and contribute it upstream, so community can benefit too. What's your vision here?

dankor avatar Aug 08 '24 08:08 dankor

I think the main thing is that we can't impact performance. @dpgaspar is currently out of office, but may be able to provide suggestions upon his return of how to handle this at the API layer.

I'm also wondering if this might be a precursor to a (fairly common) request to change URLs from being incremental integer IDs to UUIDs (e.g. here)

It might also tie into this issue.

Maybe we can just solve the performance concern and let this through, but at some point we need to have a SIP proposed about how/where UUIDs should be used more systemically by Superset. I think that's the real path forward - broadening the discussion to formulate a plan to use UUIDs everywhere relevant.

rusackas avatar Aug 15 '24 16:08 rusackas

I'd tend to have one request if possible, but I assume the front-end changes may be too big, right?

You would need to change the requests for all pages but I'm also worried that the default implementation of the endpoint will perform these joins (columns and metrics belong to separate tables).

@dpgaspar Is there a way we can achieve this without multiple requests?

If we include columns and metrics on a dataset payload, these results won't be paginated. So I advise making multiple requests

dpgaspar avatar Aug 19 '24 15:08 dpgaspar

Hey @betodealmeida @rusackas @michael-s-molina @dpgaspar !

Could you please take another look? I've reworked this PR to eliminate the need for listing calls when looking up a single item by UUID. I was inspired by the dashboard GET approach, where the dashboard API supports three different PKs: ID, slug, and UUID. I've only implemented this for the chart so far, to get your feedback on the new approach. I hope it makes more sense now. Looking forward to hearing from you.

dankor avatar Aug 26 '24 17:08 dankor

Hey @betodealmeida @rusackas @michael-s-molina @dpgaspar

Just a kind reminder. It seems you lost the PR from your radar.

dankor avatar Sep 25 '24 17:09 dankor

This seems like it ties in to other proposals and efforts involving using UUIDs more systemically, so I'll ping @hondyman and @mistercrunch to take a look here too, as well as @dpgaspar to take a look at the revisited/revised API approach, and @betodealmeida (one of the experts on managing Superset assets as code) for a second look :)

rusackas avatar Sep 30 '24 14:09 rusackas

Oh, and this will require a revisit from @michael-s-molina to un-block, if his performance concerns are indeed resolved :)

rusackas avatar Sep 30 '24 14:09 rusackas

Hi @michael-s-molina ! Thanks for the review. I have answered all comments.

Could you update the PR description to reflect the latest changes with a clear description of why the change is needed?

I have updated to support uuid along with id.

Could you add tests for the change?

Sure, I didn't add them because I didn't find them in the similar PR. So I tried to keep my PR as close to the merged one as possible.

dankor avatar Jan 13 '25 08:01 dankor

Sorry, this seems to have slipped under the collective radar. Could you give it a rebase and we'll try to take a fresh look?

rusackas avatar Apr 01 '25 05:04 rusackas

Sure, done.

dankor avatar Apr 01 '25 10:04 dankor

Just FYI, I set this to auto-close https://github.com/apache/superset/issues/29879, since... I think it does!

rusackas avatar Apr 03 '25 21:04 rusackas

Still hoping to get this one through if we can... thanks for your patience :D Looks like it needs a rebase... not sure what's up with the other test failures, but hopefully the rebase clears the deck for success!

rusackas avatar Jun 26 '25 22:06 rusackas

Resolved the conflict.

dankor avatar Jun 27 '25 06:06 dankor

Approved/triggered the CI run. @michael-s-molina since you have the "requested change" blocker, mind taking another look?

Overall I'd love to go move towards a uuid-first approach and this is a step in the right direction. I was thinking about moving most single-GET endpoints to also support uuid (if id >= 8 chars, assume it's a uuid), without creating/deprecating new endpoints. In theory there's collision errors in environments with 1M+ entities, could shift to 9 or 10 digits to prevent that. But using ids has mild security risk implications and would be nice to deprecate autonumber id-lookups eventually.

mistercrunch avatar Jun 27 '25 15:06 mistercrunch

Looks like this could use a rebase, which might resolve some CI failures if we're lucky. Only @michael-s-molina can remove the block, though.

rusackas avatar Jul 14 '25 09:07 rusackas

Looks like this could use a rebase, which might resolve some CI failures if we're lucky. Only @michael-s-molina can remove the block, though.

rusackas avatar Jul 14 '25 09:07 rusackas

@dankor if you can do that rebase to remove conflicts, I think we can get this one merged more expediently now. Thanks in advance!

rusackas avatar Jul 15 '25 11:07 rusackas

@rusackas it should be rebased now.

dankor avatar Jul 25 '25 11:07 dankor

CI issues ...

mistercrunch avatar Jul 27 '25 22:07 mistercrunch

Loving this PR as a great first step in the direction of a wider support of moving towards uuid as the primary way to address and reference object in Superset. One deeper question is how to wire this logic for reusability to be ported to all core entities.

Clearly it's probably mergeable as is (and maybe that's where we start), but the bigger question is how we factor this logic for portability to all objects that use the UUIDMixing (model mixin). Thinking there might be a DAO mixin too and similar constructs, injecting the simple methods easily for many DAOs.

@dankor , curious if you'd be interested in widening the target to support all uuid entities using reusable constructs and injecting them in all the right places (all UUID-mixed-in models) or whether you think we should merge this and using it as a blueprint to refactor into wider UUID support.

mistercrunch avatar Aug 17 '25 18:08 mistercrunch

@mistercrunch I prefer to merge things bit by bit. My next step is to support GET dataset/{uuid} in the same way. If you’re okay with it, let’s merge this one and then focus on the next.

As for the full switch, my original goal was to add uuid support to the dashboards/charts/datasets APIs. The next PR (datasets) will cover everything I personally need, but if you’d like to go further, we should come up with a plan. I’d be happy to be involved.

dankor avatar Aug 18 '25 06:08 dankor

Amazing, thanks for the contribution!

mistercrunch avatar Aug 18 '25 20:08 mistercrunch