[Idea] Synthetic `get` based on `export`
Summary of the new feature / enhancement
Allow using export with pre-defined matching rules if get is not defined.
Reasoning
Some package managers, like pip, easily support exporting all installed packages but not filtering them to a single package.
In creating a pip resource I had to essentially perform an export* then manually filter the results.
Combining this with #1261 would massively simplify the job of resource authors in similar situations.
Proposed technical implementation details (optional)
No response
While I do think there's value in a synthetic get that filters export, there's a few problems:
- We don't currently have a way to denote key properties for a resource instance, so we can't effectively filter the instances. Any conventions we try to use that don't explicitly derive from keywords in the resource instance schema will be brittle - name is always a key property except when it isn't, for example.
- Unlike some other tools (this functionality is very similar to
prefetchin Puppet, for example), we don't have a way to cache the results of the export for synthetic get in the resource - and doing it for the DSC process could be expensive/difficult. Because we can't retain the cache in memory for the resource invocation, you'll be querying the full list of instances and passing it back to DSC on every invocation. This problem gets worse for resources that use a synthetictestas well asget.
I believe these problems are tractable, but would require some careful design considerations. A few things this made me think about in no particular order:
- It might be useful, if we implement this functionality, to prefer synthetic
getwhenever a configuration defines more than one instance for a resource that implementsexport- invoke the resource once and then provide leverage the cached instances to very quickly filter current state for get and test operations. It should probably be optional behavior though, which would mean extending the resource manifest. - We would likely need to update the cache after set operations, since the final state of the instance is now the current state for any further operations (like referencing the data from a different resource).
- Speaking of which, if a resource implements
ExportandSetbut notGet, we would need to apply the synthetic get filtering to construct the output of theSetoperation for resources that don't definereturn(currently, DSC invokesGetfor those resources).
While I do think there are some very neat performance and convenience benefits from this proposal, we should weigh those against the increasing complexity for defining resources and interoperating with them not just through DSC but for higher order tools. The more flexibility authors have in selecting which operations to implement and which to use synthetically, the more context and consideration is required during resource design.
Well reasoned. Though I believe the effort is worth it.
Using a single export for synthetic gets could work, and massively cut down on the number of API calls.
export+ multiple syntheticgetto determine all resources which are not in the correct state.- Multiple set calls (which do not implement get) at once.
export+ multiple syntheticgetto determine the outcome.
The largest problem I see is actually one of user expectations. The documentation never says configurations will be evaluated in order from top to bottom. And the phrase "The example document is declarative. It describes the desired state, not how to put the system into that state." implies that order may not be maintained. However, commands like winget configure export --all create a configuration document where order matters.
This could easily save hundreds of get calls,* but you would need to be extremely explicit in the docs that order matters, or extremely explicit that config files can be executed out of order.
* Especially if there is ever a V3 "Microsoft.VisualStudio.DSC/VSComponents" adapter
As to the barriers:
- It would certainly require a schema change for both defining key properties and matching rules. For example,
dsc resource schema --resource Microsoft.WinGet/Packageshows there are four different "matchOptions" for a package. Which, to your point, uses "id" instead of "name" - You may wish to re-consider your caching strategy. Using PR #1262 as an example,
dsc -l trace resource export -r Python.Pip/Packageinvokes the schema command 155 times, Turning a 13 second export, with embedded schema, to a 1 minute export!
The largest problem I see is actually one of user expectations. The documentation never says configurations will be evaluated in order from top to bottom.
This is a fair callout - I need to address that on the docs side. For the purposes of this discussion, I'll clarify that:
- By default, DSC invokes resources in the order they are declared in the configuration document.
- If a resource instance uses the
dependsOnfield to declare a dependency on another instance, the dependent instance is always processed after the instance it depends on. - You can define chains of dependence. DSC ensures that dependent instances are always processed after their dependencies, even when an instance depends on multiple instances that themselves depend on other instances.
So while the statement in the documentation is true - the configuration document is declarative, not imperative - it is also true that the default order is determined by the author and the order that the instances are defined. If that order would be invalid (such as for a dependency chain where a dependent resource is declared before its dependency), DSC invokes the resources in the correct order. Of course, it can only determine a "correct" order when the author defines dependency relationships between the instances. If you try to configure SQL Server before you install it without declaring that the configuration depends on the install, the set operation will still fail.
Regarding caching, we haven't implemented generic caching for resource export because there wasn't a compelling reason to do so. Some resources have implemented their own caching behaviors (such as the PowerShell adapters), but we don't have an API and coherent way for resources to participate in caching data within a dsc process invocation.
I can see a lot of value in using export as a prefetch operation and caching the results for ongoing use - but it certainly requires writing up some spec and creating a public API so that we can ensure resource authors can develop their resources to participate usefully and integrating developers can understand and leverage that cache if they want to.
Sounds like we're on the same page. Nice feature, but lots of work before it can happen. Leaving the issue open, but you could probably drop the "Needs Triage" label.