lightdash icon indicating copy to clipboard operation
lightdash copied to clipboard

allow content to be saved as code

Open TuringLovesDeathMetal opened this issue 3 years ago • 42 comments

  • Allow content to be saved as code:
    • as an open-source tool, it's important that people are able to build on + share their work in Lightdash. E.g. I've built a dope dashboard looking at all of my user sessions. By being able to save content as code, this means you can create + share these visualizations and make them available for use by others. One step closer to out-of-the-box analytics 😏
    • Being able to version control critical visualizations is really important. If you have dashboards you're using for reporting to the board or regulators, these shouldn't be fiddled with, and if they are, you'll want to be able to revert the changes. Since we're building from dbt, why not take advantage of the same version control that they're using, and just use git?

TuringLovesDeathMetal avatar Aug 24 '21 13:08 TuringLovesDeathMetal

Will the visualization be saved as a static plot or as an aggregated abstraction?

I thought of the following options:

  1. Using dbt snapshots - Would need to specify it in dbt
  2. Using aggregated dataset - Users can save a limited amount of data points along with the visualization so that they can be replayed.
  3. Saving it as a static visualization

I am planning to integrate saved visualization with streamlit. Use saved charts as a discussion platform and streamlit for vertical integration.

silentninja avatar Sep 19 '21 11:09 silentninja

Great question - and I think it depends on the content!

For example, I think that dashboards used for monitoring are static logic with dynamic data (i.e. you want the data to update over time, but you want the logic generating the x & y axes to stay the same). In this case, you'd want to save the code generating the visualization + the SQL logic generating the table - I don't think we'd need to save any data since the data is expected to be updated regularly.

Dashboards or charts used in analyses are static logic with static data (i.e. you want to see the data at the time the analysis was done). In this case, I think something like dbt snapshots are a great idea! Using an aggregated dataset could also be really cool for this - but I think that dbt snapshots would almost be more powerful (since you could compare today vs. when the analysis was run). But potentially more complicated...

When you say "saving it as a static visualization" - do you mean like saving the plot as a .JPEG or something?

TuringLovesDeathMetal avatar Sep 20 '21 12:09 TuringLovesDeathMetal

When you say "saving it as a static visualization" - do you mean like saving the plot as a .JPEG or something?

Yes, I meant saving it as a static image - Can be useful when referencing in reports, slack messages, html metadata tags. Discussions can be based upon it.

Dashboards or charts used in analyses are static logic with static data (i.e. you want to see the data at the time the analysis was done). In this case, I think something like dbt snapshots are a great idea! Using an aggregated dataset could also be really cool for this - but I think that dbt snapshots would almost be more powerful (since you could compare today vs. when the analysis was run). But potentially more complicated...

Binned/aggregated data can be offered as an alternate option in case users don't want to deal with snapshot complexities.

For example, I think that dashboards used for monitoring are static logic with dynamic data (i.e. you want the data to update over time, but you want the logic generating the x & y axes to stay the same). In this case, you'd want to save the code generating the visualization + the SQL logic generating the table - I don't think we'd need to save any data since the data is expected to be updated regularly.

Saved Charts that we have now are actually this type of visualization I guess.

Btw how do you plan on using git for saving the visualization?

silentninja avatar Sep 21 '21 06:09 silentninja

So we might need to add:

  1. Static image visualization
  2. Interactive Snapshot visualization based on dbt snapshot with saving binned data as a fallback.

silentninja avatar Sep 21 '21 06:09 silentninja

Btw how do you plan on using git for saving the visualization?

The visualizations are all generated using echarts - which is just javascript (I'm pretty sure 😅 ). I think we could just save this as "visualization code" if that makes sense? So for a chart or dashboard saved as code, you'd have the data code (some .sql file) and then the viz code (some javascript or typescript? file) - these would be version controlled.

I'm not 100% sure how this works, so I might be totally wrong in assuming this is something we could do...

TuringLovesDeathMetal avatar Sep 23 '21 08:09 TuringLovesDeathMetal

Here's an example of the type of code I'd be expecting in the viz file: https://echarts.apache.org/examples/en/editor.html?c=line-simple&lang=js

TuringLovesDeathMetal avatar Sep 23 '21 08:09 TuringLovesDeathMetal

I was thinking of versioning along with data as in actual data in serialised format like csv/json like DVC. If it is just sql, git should do the magic.

silentninja avatar Sep 24 '21 14:09 silentninja

Is this issue still relevant? There have been no updates for 60 days, please close the issue or keep the conversation going!

stale[bot] avatar Nov 23 '21 14:11 stale[bot]

I came to open an issue on the same topic. I love that Lightdash's metadata is stored in code, in that it leverages dbt yaml files and everything is version controlled. I'd like for Lightdash's reports to also serialize their definitions (and probably not the data itself) to files which can also participate in the DevOps lifecycle - deploy, version, rollback, merge, etc.

Is there any plan to have the report definitions as something which could be checked into git?

aaronsteers avatar Feb 01 '22 22:02 aaronsteers

Hey @aaronsteers nice! I opened a PR a while back for discussion on this: https://github.com/lightdash/lightdash/pull/589

I put an example of how a saved chart or dashboard might be as-code and checked into git.

What do you think of the approach? The structure is inspired by Kubernetes.

owlas avatar Feb 02 '22 10:02 owlas

Is this issue still relevant? There have been no updates for 60 days, please close the issue or keep the conversation going!

stale[bot] avatar Apr 03 '22 10:04 stale[bot]

Still relevant!

hamzahc1 avatar May 04 '22 10:05 hamzahc1

Definitely still relevant!

benomahony avatar May 06 '22 16:05 benomahony

Was just speaking to @benomahony about this! The problem we face in Looker today is that if I refactor some data models in dbt that breaks looker content. There's no nice way to store the content changes on a branch. Basically I often need to break dashboard first and then fix them (often manually) later on. The manual nature of fixing charts / visualisations means I often avoid refractors that touch on our visualisation tool completely. This feels really backwards when we've invested so much time on CI to keep our pipelines robust but then stumble at the final step (the most visible bit to our stakeholders). So for me lack of version control in this area is now the weakest link in the end to end data process.

So for me this issue is about:

  • We want to be able to programmatically fix content during refractors (might just mean a search / replace in an IDE)
  • Be able to stage content changes in some way e.g. a git branch
  • Be able to get benefits of git - reviews, codeowners etc for key dashboards
  • Be able to build CI checks for content breakages - as warnings or blocks depending on importance of content
  • Still retain option for some content to be created / changed by analysts without a version control process

XiaozhouWang85 avatar May 06 '22 17:05 XiaozhouWang85

A solution might be for all content to appear in some sort of manifest.json. This would allow CI to check for consistency between dbt and content. Then we could define version controlled content by moving parts of that JSON into our dbt repo and have it propagate into lightdash. This will allow me to build / edit content using UI but get benefits of code / version control. Then some way to bulk edit the content (search / replace on manifest.json and upload?).

I appreciate this is asking for a lot but it would be a killer feature for me and well worth almost any effort to migrate to.

XiaozhouWang85 avatar May 06 '22 18:05 XiaozhouWang85

@XiaozhouWang85 this sounds like a pretty cool place to be with your BI tool 😍 (also, FYI - you were the first ever non-internal Lightdash user, back when we couldn't even get it installed on a machine 😅)

can you give some examples of refactors in dbt that would lead to something breaking in your BI tool?

TuringLovesDeathMetal avatar May 09 '22 08:05 TuringLovesDeathMetal

The nastiest refactor that shouldn't be so hard was when an analyst had done this:

They wanted to pivot a chart on user segment.

SELECT
trans.transaction_id,
....
user.segment
FROM transactions
LEFT JOIN users USING(user_id)

So obviously this wasn't the right way to do this. From a data modelling point of view, stuff shouldn't exist on a table just because of reporting needs. Can simply set up a join in LookML and get all user columns that way. So I wanted to remove the user segment fields from transactions data model. The problem was, they had already built some dashboards on top of it. Since the column was now moving to a different explore the Content Validator in Looker does not allow for a search / replace type fix for this. So the problem remains there, because I don't want to spend a day rebuilding dashboards.

Content Validator in Looker does allow you to fix renames in a search replace manner but even then its a) manual b) not staged - the changes you make using content validator goes straight into prod. Not helpful when my dbt changes are still on a branch and in development.

XiaozhouWang85 avatar May 09 '22 09:05 XiaozhouWang85

Cool, makes sense. So biggest problems in that one are:

  1. It sucks that I can't programatically find + replace a change I've made (e.g. moving a field across models).
  2. Even if I could find + replace things, I'd want to do this in a place where I could stage the changes.

I love the problems that you listed above too! they totally make sense, and I feel like I could come up with "example use cases" from personal Monzo experiences for all of them.

Will definitely reach out to chat to you once we start working some more on this @XiaozhouWang85 👀

TuringLovesDeathMetal avatar May 10 '22 09:05 TuringLovesDeathMetal

Is this issue still relevant? There have been no updates for 60 days, please close the issue or keep the conversation going!

stale[bot] avatar Jul 09 '22 13:07 stale[bot]

still relevant! Should be part of the CLI now as well?

hamzahc1 avatar Jul 10 '22 20:07 hamzahc1

some competitor research for more context

Metabase's "version" of this is called serialization: https://www.metabase.com/docs/latest/installation-and-operation/serialization.html

^This isn't really content as code, but more of a way of pushing stuff between dev --> prod

Here’s a doc on Looker’s user-defined vs. lookml dashboards: https://cloud.google.com/looker/docs/types-of-dashboards and building lookml dashboards: https://cloud.google.com/looker/docs/building-lookml-dashboards

I can't find any examples for Tableau - most of the threads were talking about switching instances and exporting stuff (which it sounds like you can just export workbooks and reopen them in another instance if they have the correct data...), or embedding

TuringLovesDeathMetal avatar Aug 26 '22 08:08 TuringLovesDeathMetal

Another example from Oliver:

I’m updating the demo thyme project. It is absolutely impossible to know where a metric is used in Lightdash. I want to rename a bunch of them (I could use label but would rather change their official name). I also have some to remove. But I really don’t know which dashboards and charts in the demo uses count_user_id vs count_distinct_user_id If it was as code I would literally just search count_user_id and replace them all with count_distinct_user_id Absolute superpower of having everything as code. I say (maybe) because there could be other solutions. Like the Lightdash content validator, which could scrape the API and look for problems (edited)

TuringLovesDeathMetal avatar Sep 29 '22 08:09 TuringLovesDeathMetal

Is there any progress being made on this? Being able to define charts and dashboards as (readable) code and to sync changes as part of the CI/CD pipeline would be an awesome feature!

giamo avatar Nov 10 '22 13:11 giamo

Technical proposals:

  • Point of entry -- Lightdash would expect by default a lightdash_project.yml file at the root -- File path configurable via CLI options -- This file would have the main configuration ( similar to dbt_project.yml )
  • Versioning -- Required version at the top of the yml files. eg version: 1.2 -- This version is not related to app releases -- There should be a JsonSchema file for each version so anyone can write linters for their IDE/CI/CD and also used by us to validate when consuming the files
  • Backwards compatible -- Server has a class/function per version that validates against the JsonSchema and returns an object matching the latest type/interface handled by the app -- Note that we may deprecate old versions to reduce the amount of maintenance

Questions we still need to answer before starting implementation:

  • how do we match the content in the files with the content in the DB?
  • how do we handle references to other saved content? eg: dashboards reference charts
  • is this content read-only in the app? eg: can I save a new chart version or does it forces the user to create a new chart from that one?

ZeRego avatar Nov 18 '22 16:11 ZeRego

Point of entry

Is there any reason we need to have this? I can't think of any configuration we'd need in v1? And yml files could live anywhere leaving the user to structure their yaml anyway they prefer. Then deploying is just lightdash deploy or lightdash deploy path/to/my/yml

Versioning

Yes! See below for a proposal

Backwards compatible

Makes sense, we should be able to do non-destructive changes on versions (e.g. add optional fields). But breaking changes need a new version

How do we match content in the files to the db?

I came across this problem while hacking together a solution. I started by having the uuid but it would be very annoying to write by hand. So I imagined a CLI where any resources that have no uuid will be auto assigned one by the tool.

We could make the uuid just a unique string rather than a uuid exactly. But there's a problem. If I have a chart called total-active-users and I want to share that with somebody else they may already have a chart with that name!!! The namespace problem is a real one that affects dbt badly (and is solved in Kubernetes and other tools).

An alternative I've considered is to give a short slug my-new-chart or name my chart that must be unique within a space

That way you could take a bunch of charts/dashboards with a friendly name (not a uuid) and deploy them to another space. I'm not considering this as a replacement for uuids in general, just a friendly way of referring to charts in a project (space + chart slug).

If we want spaces to be handled correctly I had two ideas:

  • Have the space uuid on the chart yaml
  • Have a .space.yaml in a directory that has a space uuid in - that way you could have directories in your project directly represent a space. Like this:
my-project
└── spaces
    ├── marketing
    │   ├── .space.yaml
    │   ├── chart5.yaml
    │   └── dashboard1.yaml
    └── sales
        ├── .space.yaml
        ├── chart1.yaml
        └── chart2.yaml

So either we stick to really random ids and make them easy to generate. Or have friendly names with some way to namespace them (probably with existing spaces). So it's easy for people to share a repo of charts with somebody else without any naming conflicts.

how do we handle references to other saved content? eg: dashboards reference charts

References will just be ids in the yaml files. We can check at the moment we insert to the db if they are valid.

is this content read-only?

v1 I imagine that the UI doesn't even know this was created as code. If someone edits the chart/dashboard it'll overwrite the definition in the database. You'd have to bring it back up to the as-code version by running lightdash deploy again

I imagine a recommended structure where users have a space that nobody has write permission on and is always managed as code instead of introducing a new UX in the app for stuff created as code.

Example yaml

resource: v1/charts

uuid: 3675b69e-8324-4110-bdca-059031aa8da3
name: My chart
description: My chart description
explore: users
query:
  dimensions:
    - one
    - two
  metrics:
    - three
    - four
  filters:
    - (horrible api)
  sorts:
    - field: 5
      descending: true
  limit: 100
  tableCalculations:
    - name: my
      displayName: another
      sql: horrid

pivotConfig:
  columns: [my-pivot-dim]

chartConfig:
  type: cartesian
  layout:
  echartsConfig:

owlas avatar Dec 05 '22 15:12 owlas

Discussion related to this issue: https://github.com/lightdash/lightdash/discussions/3893#discussioncomment-4331084

TuringLovesDeathMetal avatar Dec 08 '22 11:12 TuringLovesDeathMetal

Created a milestone and broke down the work into multiple tickets.

https://github.com/lightdash/lightdash/milestone/65

ZeRego avatar Jan 05 '23 13:01 ZeRego

Hi there! Just wanted to chime in to say I think this sounds like a brilliant feature and share a bit about my current (and ideal) workflow in case it is useful.

Current workflow

  1. Open development branch on dbt project to create/modify metrics e.g. new-metrics-branch
  2. Have a personal development project within Lightdash which I adjust the project settings of to point to new-metrics-branch
  3. As I make changes to new-metrics-branch, I create the visualisations I am working towards within my dev environment on Lightdash so I can ensure the metrics are coming through properly, their values match what they should, they're labelled correctly etc.
  4. When I am happy, I open a PR with new-metrics-branch, it gets merged and the changes get deployed to 'prod' Lightdash
  5. I now go over to the main project of Lightdash which is pointed at main, and have to recreate all the visualisations I did in step 3

Ideal workflow

  1. Open development branch on dbt project to create/modify metrics e.g. new-metrics-branch
  2. Open development branch within Lightdash which points at new-metrics-branch (in some sort of developer environment within Lightdash)
  3. As I make changes to new-metrics-branch, I create the visualisations I am working towards within my dev environment on Lightdash so I can ensure the metrics are coming through properly, their values match what they should, they're labelled correctly etc.
  4. When I am happy, I open a PR with new-metrics-branch, it gets merged and the changes get deployed to 'prod' Lightdash
  5. At the same time, I merge my development branch on Lightdash into main, so all the visualisations I had drafted appear on Lightdash and I don't have to repeat work

Hopefully there's elements of the above which are useful when thinking about implementation of this 'content as code' feature - though obviously I know there's a lot in there! Excited to see the future of this feature and follow it closely :)

Jake-Curtis avatar Jan 16 '23 09:01 Jake-Curtis

Is this issue still relevant? There have been no updates for 60 days, please close the issue or keep the conversation going!

stale[bot] avatar Mar 17 '23 18:03 stale[bot]

Totally relevant! +1 for content as code (aka LookML dashboard-like)

kylelundstedt avatar Mar 17 '23 19:03 kylelundstedt