meltano icon indicating copy to clipboard operation
meltano copied to clipboard

feature: Support project templates in `meltano init`

Open edgarrmondragon opened this issue 2 years ago • 4 comments

Feature scope

Other

Description

For example, someone from the community creates a Meltano project to move data from Shopify to Snowflake, and makes it available as a "template" in a public GitHub repo.

Then, other users could reuse the template with something like:

meltano init my_project --template=https://github.com/cooldataperson/meltano-shopify-snowflake-template

edgarrmondragon avatar Aug 26 '22 17:08 edgarrmondragon

@edgarrmondragon a similar idea was raised on the hub by @aaronsteers. I also had an extension to the idea to do something similar using our support for multiple meltano.yml files. If someone made a Shopify/Snowflake collection then today you should in theory be able to simply drop it in your project directory and it would work following an install. I suggested maybe having an add command that would do a check for you to avoid double defining plugins, finding conflicts, installing, etc. Same could be done for init as well like:

meltano add collection shopify-snowflake

meltano init my_project --from-collection=shopify-snowflake

What do you think about that variation of your idea?

pnadolny13 avatar Aug 26 '22 17:08 pnadolny13

Injecting project resources into an existing project was more along what I was thinking, where I think starting from a template is an interesting alternate take.

If adding to an existing project, you need to avoid name collisions. But if starting with the template itself, we don't have that same problem.

So I do like Edgar's proposal here as a quickstart:

meltano init my_project --template=https://github.com/cooldataperson/meltano-shopify-snowflake-template

We could register templates in the hub too. So meltano init my_project --template=snowflake-spotify-tutorial might look up the template name on the hub and translate to the equivalent of the above.

When adding to an existing project, I recently started thinking about a mandatory prefix on all imported resources. The prefix could be defaulted to the template name, or overriden.

Assuming snowflake-spotify-tutorial has three plugins defined: tap-spotify, target-snowflake, and dbt-snowflake, then running meltano add subproject tutorial --from_template=snowflake-spotify-tutorial would add these plugins:

tutorial-tap-spotify
tutorial-target-snowflake
tutorial-dbt-snowflake

It's interesting to think of templates and subprojects as basically the same thing, with some predefined rules explaining how they'd be applied in the 'new project' vs the 'injected subproject' behaviors.

aaronsteers avatar Aug 26 '22 18:08 aaronsteers

We could register templates in the hub too. So meltano init my_project --template=snowflake-spotify-tutorial might look up the template name on the hub and translate to the equivalent of the above.

@aaronsteers I like that and it's actually similar to where I got inspiration for this issue: https://www.gatsbyjs.com/starters/

When adding to an existing project, I recently started thinking about a mandatory prefix on all imported resources. The prefix could be defaulted to the template name, or overriden.

Yeah, I imagine the template could require the user to input some values that would include the plugins prefix.

edgarrmondragon avatar Aug 26 '22 18:08 edgarrmondragon

@aaronsteers I like that and it's actually similar to where I got inspiration for this issue: https://www.gatsbyjs.com/starters/

Heh i pitched something similar in the hub issue https://github.com/meltano/hub/issues/210#issuecomment-1203359986

pandemicsyn avatar Aug 31 '22 02:08 pandemicsyn

@aaronsteers I'm a fan of getting people started as quickly as possible, so this really resonates with me:

mkdir new_proj
pip install meltano
meltano init . --template=template-aws-s3-csv-snowflake

would make a neat workflow.

One piece would be missing: the configuration. Maybe we could use something like the terraform approach to modules? Modules in terraform come with a .tfvariables file which has the definitions of the variables one has to change with a comment behind them.

So our workflow above would then have just one final step: adapt the .meltanovariables file to your setup. I guess we could just do this by using an .env file (without introducing the .meltanovariables file). That would work with existing functionality as far as I understand it. So ship every template with a .template-config.env file.

sbalnojan avatar Sep 29 '22 12:09 sbalnojan

FYI - I've drafted a very early spec discussion for modules as subprojects:

  • #7100

Feedback, questions, suggestions much appreciated.

aaronsteers avatar Dec 16 '22 17:12 aaronsteers

@aaronsteers I like the modules idea, but I think it's a bit complex for the problem we're trying to solve here which is to get folks up and running quickly with a set of plugins and optional config.

Submodules would get them up and running but not in a way that is as transparent as just importing the template into their own project. Am I missing a big benefit here for that specific problem?

tayloramurphy avatar Dec 16 '22 22:12 tayloramurphy

Hi, @tayloramurphy. I realize now my last comment left a lot missing in terms of context. The proposal in #7100 is not trying to handle new project initialization, but I link it because the paradigms probably can and should be related to one another.

In that proposal, I propose that "modules" would really just be any project grafted into another project as a subproject, with certain constraints like ignoring the subproject's environment declarations. I called that a "module" (for now) but we could explore other names - and the same library of 'modules' could also be referenceable as templates, since they are all just different names for Meltano projects.

That proposal also allows for enabled/disabled submodules, which for instance could let the user enable or disable all the Postgres-specific or Snowflake-specific boilerplate.

Some prior discussions proposed "data apps" or "applets" for turn-key project solutions, and that's probably the closest I can think of as a generic name that fits the ontology of "can be used as either a 'module' to add to an existing project or a 'template' to use as starting point for a new one". In both cases, we'd probably allow "add from url" as well as "add from hub", since both have really compelling use cases.

These concepts could be completely compatible:

# Start a new project from an existing repo
meltano init my_project --template=https://github.com/cooldataperson/meltano-shopify-snowflake-template

# Import an existing repo to add those capabilities to my current project
meltano module import https://github.com/cooldataperson/meltano-shopify-snowflake-template

The behavior of these two are different, but the set of projects to choose from could be shared across both use cases.

aaronsteers avatar Dec 16 '22 22:12 aaronsteers

Also, it's worth mentioning that a template project could come pre-baked with a ton of disabled submodules.

The best example would be having a set of submodules for all the 'warehouses':

For instance, a module called snowflake-warehouse could contain tap-snowflake, target-snowflake, and dbt-snowflake and sqlfluff, all preconfigured to play together nicely with Snowflake warehouses. A tutorial could instruct the reader to either add the snowflake-warehouse or (if it's already included) to simply enable it, ignoring postgres-warehouse, redshift-warehouse, etc. Since a disabled module doesn't have any install cost and doesn't expand the list of plugins, we could be pretty liberal with what we 'ship' in the project as disabled by default.

aaronsteers avatar Dec 16 '22 23:12 aaronsteers

@aaronsteers 🙏 much appreciated context. Makes much more sense 😄

tayloramurphy avatar Dec 18 '22 22:12 tayloramurphy

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

stale[bot] avatar Apr 26 '23 10:04 stale[bot]