!!!TASK: Use workspaceName instead of contentStreamId in AssetUsageProjection
Upgrade instructions
⚠️ This is a breaking change!
For existing projects its nessecary to delete the cr_*_p_neos_asset_usage tables first.
Then you need to setup the content repositorys again:
./flow cr:setup
And also run ./flow cr:projectionreplay AssetUsage afterwards.
Review instructions
resolves: #5084
Checklist
- [ ] Code follows the PSR-2 coding style
- [ ] Tests have been created, run and adjusted as needed
- [ ] The PR is created against the lowest maintained branch
- [ ] Reviewer - PR Title is brief but complete and starts with
FEATURE|TASK|BUGFIX - [ ] Reviewer - The first section explains the change briefly for change-logs
- [ ] Reviewer - Breaking Changes are marked with
!!!and have upgrade-instructions
~Currently you have to delete the asset_usage table first before it works. But it would be better if it would work right away without having to delete the asset_usage table first.~
See Upgrade instructions above.
Current state is, that we might need to implement also the node hierarchy in the AssetUsageProjection, to be able to handle child nodes on deleted nodes.
See: https://neos-project.slack.com/archives/C04PYL8H3/p1718654280746299
As discussed with @bwaidelich :
- We want to try to replace the projection with a CatchUpHook
- We just want to track changes to asset-properties.
- For the initial state, we need to build the AssetUsage based on the latest node state
- iterating over all workspaces and all nodes (from root to leafs)
- After initial node state has been build we handle changes to the AssetUsage by the events of the CatchUpHook
- We want to try to extract the "AssetUsage-Comparison-Storage" logic, so we can mostly work with the node read model
PROS:
- We can use Projections to gather workspace chains and node hierarchy
CONS:
- Heavy job for inital state building (traversing all nodes, in all dimensions and all workspaces)
@dlubitz thanks for the summary. I'll add some more topics we discussed from the top of my head:
In general, just looking at the state should make a lot of edge cases simpler because it is much easier to reason about. For example: IMO it makes sense to only ever have a single entry in the asset usage table if it is used in the same node property (i.e. aggregate id, dimension space point, property name and asset id are equal) no matter whether the asset was changed an reverted in the meantime.
But there is one special case and that's "materialized nodes" : Whenever a node (property) is changed that respective node is copied in the database. That is done to optimize disk space and performance, but it's not an implementation detail because it means that other changes to that node won't "shine through" from the base workspace any longer as soon as it's changed once – until that change is published or reverted that is.
This can lead to at least two weird situations in relation to asset usages:
Scenario: asset is only used in the child workspace
- Node
ais created in workspacelivewith a propertyxofasset1- One entry in the asset usage table:
workspace: live, node: A, property: x, asset: asset1
- One entry in the asset usage table:
- A different property of that node is changed in workspace
user- No change to the asset usage table
- The property
xof nodeais set tonull(i.e. the asset is removed from the prop)- The entry from the asset usage table is removed as well
As a graph:
%%{init: { 'theme': 'base', 'gitGraph': {'showBranches': true, 'showCommitLabel':true,'mainBranchName': 'live'}} }%%
gitGraph
commit id:"prop x: asset1"
branch user
commit id:"prop y: foo"
checkout live
commit id:"prop x: ~"
Issue
As a result of the above, the workspace user still contains that node a with the asset asset1 used. But it's not contained in the asset usage table any longer.
[!NOTE]
When building up the asset usage table from scratch (i.e. from the content graph state) it would add an entry to the usage table in this case because we can't easily differentiate whether the property was set in the workspace explicitly
Consequence
It at least feels a bit weird, that the asset usage is not tracked in this case (and especially that it is when the usage tabel is built-up from the content graph state). But in reality this probably isn't a problem because:
As soon as the live workspace is changed, the user workspace is outdated. So it contains a state that does not reflect "the real world", i.e. it can't be published as is.
When the conflict is resolved (i.e. the user workspace is rebased) the asset removal event will be reflected and the asset is no longer in use for good.
[!NOTE]
If, at some point, we allow to resolve conflicts in terms of "keep my change", there will have to be an event that reverts the asset removal – so the asset usage table will be updated correctly
There are some more, similar, scenarios, e.g. when the asset property is not removed but changed to a different asset in the base workspace. But the resulting consequences are mostly similar to the case above.
Consideration
While writing this down, I was wondering whether that initial claim
only ever have a single entry in the asset usage table if it is used in the same node property
really makes sense.. We could also sync the node in the asset usage table whenever it is changed (i.e. every event that would lead to the node being "materialized" if it wasn't yet). Sync as in: copy all usages for that node from the base workspace(s) With that the 2nd step in the scenario above would have a different effect:
- Node
ais created in workspacelivewith a propertyxofasset1- One entry in the asset usage table:
workspace: live, node: A, property: x, asset: asset1
- One entry in the asset usage table:
- A different property of that node is changed in workspace
user-
Add entry to the asset usage table:
workspace: user, node: A, property: x, asset: asset1
-
Add entry to the asset usage table:
- The property
xof nodeais set tonull(i.e. the asset is removed from the prop)- The entry from the asset usage table is removed as well
This way the resulting state would be the same for the initial setup and when executed via hook.
[!NOTE]
I still think that we should count only one usage if the asset is the same in the child workspace, but we could filter out the duplicates at read time
Copying all usages for each (initial) change might pose a performance issue though..
One comment regarding
Heavy job for inital state building (traversing all nodes, in all dimensions and all workspaces)
We would only have to look into child workspaces with pending changes. And for those we could even find only potentially affected nodes by iterating through the events of the respective content stream (but that is an optimization that could always add lateron)
Closed in favor of https://github.com/neos/neos-development-collection/pull/5258