feat(api): extending uuid field in dataset, chart, dasboard APIs
SUMMARY
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
Extend API to handle uuid
TESTING INSTRUCTIONS
- GET http://localhost:8088/api/v1/dataset/?q={"columns": ["uuid"]}
- GET http://localhost:8088/api/v1/chart/?q={"columns": ["uuid"]}
- GET http://localhost:8088/api/v1/dashboard/?q={"columns": ["uuid"]} Plus, PUT and POST.
ADDITIONAL INFORMATION
- [ ] Has associated issue:
- [ ] Required feature flags:
- [ ] Changes UI
- [ ] Includes DB Migration (follow approval process in SIP-59)
- [ ] Migration is atomic, supports rollback & is backwards-compatible
- [ ] Confirm DB migration upgrade and downgrade tested
- [ ] Runtime estimates and downtime expectations provided
- [x] Introduces new feature or API
- [ ] Removes existing feature or API
Codecov Report
:x: Patch coverage is 95.23810% with 4 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 72.79%. Comparing base (2179081) to head (2e73419).
:warning: Report is 18 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| superset/models/slice.py | 57.14% | 2 Missing and 1 partial :warning: |
| superset/utils/cache.py | 80.00% | 0 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #29573 +/- ##
===========================================
+ Coverage 0 72.79% +72.79%
===========================================
Files 0 574 +574
Lines 0 41769 +41769
Branches 0 4401 +4401
===========================================
+ Hits 0 30407 +30407
- Misses 0 10192 +10192
- Partials 0 1170 +1170
| Flag | Coverage Δ | |
|---|---|---|
| hive | 47.07% <73.80%> (?) |
|
| mysql | 71.80% <95.23%> (?) |
|
| postgres | 71.86% <95.23%> (?) |
|
| presto | 50.77% <75.00%> (?) |
|
| python | 72.76% <95.23%> (?) |
|
| sqlite | 71.42% <95.23%> (?) |
|
| unit | 100.00% <ø> (?) |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Hi, @dankor! Welcome to the Superset GitHub!
Can you add some context on why these fields are needed? Thanks!
Hi, @dankor! Welcome to the Superset GitHub!
Can you add some context on why these fields are needed? Thanks!
Hi @betodealmeida! Sure! I'm working on establishing a robust and granular deployment process for Superset. The core issue here is similar to why we expose these UUIDs in the import/export API: we need persistent identifiers across multiple Superset instances.
You might suggest extending the import/export API instead, but we’re encountering several challenges:
- No support for tags, roles, etc.
- Deprecated resources aren’t cleaned up.
- Archives are much harder to debug compared to classic CRUD operations.
What I’m really aiming for is a declarative way to manage Superset resources (let’s call it a "report-as-code" approach). Imagine having a development environment where everyone can experiment freely. Once a report is ready, it can be committed to git, reviewed, and then deployed to higher environments.
Ideally, I envision mature version of this, but supporting all types of resources. You can check out a proof-of-concept in this video, though no code is available.
My plan is to implement this feature incrementally, starting with minor changes and aligning as closely as possible with upstream.
Hope this makes sense! Let me know if you're open to these kinds of contributions and what concerns the community might have.
Hey all! I'm just wondering if there are any upcoming plans or next steps here. Could you please let me what to expect?
It looks like CI is stuck because of some tests failing (which I'm re-running), and pre-commit checks (i.e. linting) so you will need to run the Pre-commit hooks
Superset uses Git pre-commit hooks courtesy of pre-commit. To install run the following:
pip3 install -r requirements/integration.txt
pre-commit install
A series of checks will now run when you make a git commit.
You can then run pre-commit manually:
pre-commit run --all-files
Hi @rusackas ! Thanks! I fixed the linting.
@michael-s-molina Fair point. Actually, I have added this fields in order to q them:
{
"columns": [
"always_filter_main_dttm",
"cache_timeout",
"catalog",
"columns.advanced_data_type",
"columns.column_name",
"columns.description",
"columns.expression",
"columns.extra",
"columns.filterable",
"columns.groupby",
"columns.is_active",
"columns.is_dttm",
"columns.python_date_format",
"columns.type",
"columns.verbose_name",
"default_endpoint",
"description",
"extra",
"fetch_values_predicate",
"filter_select_enabled",
"is_managed_externally",
"is_sqllab_view",
"main_dttm_col",
"metrics.d3format",
"metrics.description",
"metrics.expression",
"metrics.extra",
"metrics.metric_name",
"metrics.metric_type",
"metrics.verbose_name",
"metrics.warning_text",
"normalize_columns",
"offset",
"schema",
"sql",
"table_name",
"template_params",
"uuid",
"database.uuid"
],
"filters": [
{
"col": "uuid",
"opr": "eq",
"value": "7d2a123a-1952-46e1-b9d0-4d41e1e042c1"
}
]
}
Otherwise these fields won't be listed with q.
Do you suppose to keep list_columns unchanged and perform two requests instead?
- Run
list_columnswith uuid filter and get id. - Run
show_select_columnsagainst obtained id with all required fields.
I'd tend to have one request if possible, but I assume the front-end changes may be too big, right?
I'd tend to have one request if possible, but I assume the front-end changes may be too big, right?
You would need to change the requests for all pages but I'm also worried that the default implementation of the endpoint will perform these joins (columns and metrics belong to separate tables).
@dpgaspar Is there a way we can achieve this without multiple requests?
I'm wondering if we can come up with some action plan here. It currently seems quite frontend-oriented. I'd like to find a common ground in order to both fulfill my use-case and contribute it upstream, so community can benefit too. What's your vision here?
I think the main thing is that we can't impact performance. @dpgaspar is currently out of office, but may be able to provide suggestions upon his return of how to handle this at the API layer.
I'm also wondering if this might be a precursor to a (fairly common) request to change URLs from being incremental integer IDs to UUIDs (e.g. here)
It might also tie into this issue.
Maybe we can just solve the performance concern and let this through, but at some point we need to have a SIP proposed about how/where UUIDs should be used more systemically by Superset. I think that's the real path forward - broadening the discussion to formulate a plan to use UUIDs everywhere relevant.
I'd tend to have one request if possible, but I assume the front-end changes may be too big, right?
You would need to change the requests for all pages but I'm also worried that the default implementation of the endpoint will perform these joins (columns and metrics belong to separate tables).
@dpgaspar Is there a way we can achieve this without multiple requests?
If we include columns and metrics on a dataset payload, these results won't be paginated. So I advise making multiple requests
Hey @betodealmeida @rusackas @michael-s-molina @dpgaspar !
Could you please take another look? I've reworked this PR to eliminate the need for listing calls when looking up a single item by UUID. I was inspired by the dashboard GET approach, where the dashboard API supports three different PKs: ID, slug, and UUID. I've only implemented this for the chart so far, to get your feedback on the new approach. I hope it makes more sense now. Looking forward to hearing from you.
Hey @betodealmeida @rusackas @michael-s-molina @dpgaspar
Just a kind reminder. It seems you lost the PR from your radar.
This seems like it ties in to other proposals and efforts involving using UUIDs more systemically, so I'll ping @hondyman and @mistercrunch to take a look here too, as well as @dpgaspar to take a look at the revisited/revised API approach, and @betodealmeida (one of the experts on managing Superset assets as code) for a second look :)
Oh, and this will require a revisit from @michael-s-molina to un-block, if his performance concerns are indeed resolved :)
Hi @michael-s-molina ! Thanks for the review. I have answered all comments.
Could you update the PR description to reflect the latest changes with a clear description of why the change is needed?
I have updated to support uuid along with id.
Could you add tests for the change?
Sure, I didn't add them because I didn't find them in the similar PR. So I tried to keep my PR as close to the merged one as possible.
Sorry, this seems to have slipped under the collective radar. Could you give it a rebase and we'll try to take a fresh look?
Sure, done.
Just FYI, I set this to auto-close https://github.com/apache/superset/issues/29879, since... I think it does!
Still hoping to get this one through if we can... thanks for your patience :D Looks like it needs a rebase... not sure what's up with the other test failures, but hopefully the rebase clears the deck for success!
Resolved the conflict.
Approved/triggered the CI run. @michael-s-molina since you have the "requested change" blocker, mind taking another look?
Overall I'd love to go move towards a uuid-first approach and this is a step in the right direction. I was thinking about moving most single-GET endpoints to also support uuid (if id >= 8 chars, assume it's a uuid), without creating/deprecating new endpoints. In theory there's collision errors in environments with 1M+ entities, could shift to 9 or 10 digits to prevent that. But using ids has mild security risk implications and would be nice to deprecate autonumber id-lookups eventually.
Looks like this could use a rebase, which might resolve some CI failures if we're lucky. Only @michael-s-molina can remove the block, though.
Looks like this could use a rebase, which might resolve some CI failures if we're lucky. Only @michael-s-molina can remove the block, though.
@dankor if you can do that rebase to remove conflicts, I think we can get this one merged more expediently now. Thanks in advance!
@rusackas it should be rebased now.
CI issues ...
Loving this PR as a great first step in the direction of a wider support of moving towards uuid as the primary way to address and reference object in Superset. One deeper question is how to wire this logic for reusability to be ported to all core entities.
Clearly it's probably mergeable as is (and maybe that's where we start), but the bigger question is how we factor this logic for portability to all objects that use the UUIDMixing (model mixin). Thinking there might be a DAO mixin too and similar constructs, injecting the simple methods easily for many DAOs.
@dankor , curious if you'd be interested in widening the target to support all uuid entities using reusable constructs and injecting them in all the right places (all UUID-mixed-in models) or whether you think we should merge this and using it as a blueprint to refactor into wider UUID support.
@mistercrunch I prefer to merge things bit by bit. My next step is to support GET dataset/{uuid} in the same way. If you’re okay with it, let’s merge this one and then focus on the next.
As for the full switch, my original goal was to add uuid support to the dashboards/charts/datasets APIs. The next PR (datasets) will cover everything I personally need, but if you’d like to go further, we should come up with a plan. I’d be happy to be involved.
Amazing, thanks for the contribution!