Data Model: Inter module dependencies
I started working on an intel module for Authentik, and an issue came up:
- In Authentik, I can define GitHub as an identity provider for my Authentik instance
- When I collect data, I want to be able to create a link between the AuthentikUser and the GitHubUser if applicable
- Currently, there is no way to run the intel modules in a specific order, it seems
- If GitHub is launched before Authentik, that's fine, but otherwise, I lose the information
To me, this problem will arise whenever we want inter-module links (all SSO, compute instance linked to an external provider or a database from another provider, DNS like Cloudflare, etc.)
The naive approach would be to force the order of the modules in the code, but this might be difficult to maintain over time, especially with the growing number of modules.
I propose a more modular approach:
- Allow the definition of dependency in
CartographyRelSchemawith a flag indicating that the module enabling the ingestion of thetarget_node_labelmust have been executed beforehand - We can easily infer the name of the corresponding module from the module (aka folder) in which it is located
- During sync, we launch the modules in an order that ensures this dependency
We have very similar situation, but different setup. Dependencies are very messy and we should avoid unless they are universal.
Questions
- Is the cross-module dependency hardcoded in the schema definition?
- How will this work if another setup uses a different SSO?
Example:
- A-company uses Authentik
- B-company uses Okta
- C-company uses Google
I think there are two approach:
You define a generic relationship pointing to a 'ProviderUser' generic label that is used as a secondary label for GitHub, Google, Microsoft etc ...users and then add a dependency on that generic label This allow you to define only one relationship but that will results on 'implicit dependency' and could be hard to debug (especially if you have circular dependency)
You define a per source relationship in the schema. That requires more definitions but every dependency is explicit.
I said dependency but in fact is more a best effort contract: GitHub should be run before Authentik because Authentik may build relationship to GitHubUser.
That is not a requirement (may be a warning if order can not be guaranteed) but should definitely not block anything
Yes, multiple modules have different dependencies, and order does matter. If we run the Authentik sync first before Github users are in the graph, we won't know about Github users. And vice versa.
So, the solution would be to run Cartography a second time so that the two node types would be attached to each other. In this way, Cartography is an eventually-consistent system.
We've talked over the years about adding some dependency or ordering structure in the form of a DAG or similar, but this adds a great deal of complexity to the project, and how this should be implemented varies a lot deployment by deployment. As such, the decision has been to keep the tool simple, and we allow things like custom sync commands if others want to build their own sync orchestrator around the core project.
I propose a more modular approach:
Allow the definition of dependency in CartographyRelSchema with a flag indicating that the module enabling the ingestion of the target_node_label must have been executed beforehand We can easily infer the name of the corresponding module from the module (aka folder) in which it is located During sync, we launch the modules in an order that ensures this dependency
I do love these ideas (especially because the data model does lend itself really well to create a DAG/scheduling structure because each rel schema essentially tells you which other modules must come beforehand), but again, this makes the project pretty complex when the simple solution would be to just ask the user to run the sync 2 times instead of 1.
An orchestrator would be cool, but I think this is an extra bit of functionality that isn't a great fit right now for the core project. Happy to keep talking and thinking on this; very much enjoy the discussion