neos-development-collection !!!TASK: Use workspaceName instead of contentStreamId in AssetUsageProjection

Upgrade instructions

⚠️ This is a breaking change!

For existing projects its nessecary to delete the cr_*_p_neos_asset_usage tables first.

Then you need to setup the content repositorys again:

./flow cr:setup

And also run ./flow cr:projectionreplay AssetUsage afterwards.

Review instructions

resolves: #5084

Checklist

[ ] Code follows the PSR-2 coding style
[ ] Tests have been created, run and adjusted as needed
[ ] The PR is created against the lowest maintained branch
[ ] Reviewer - PR Title is brief but complete and starts with FEATURE|TASK|BUGFIX
[ ] Reviewer - The first section explains the change briefly for change-logs
[ ] Reviewer - Breaking Changes are marked with !!! and have upgrade-instructions

Jun 01 '24 14:06 crydotsnake

~Currently you have to delete the asset_usage table first before it works. But it would be better if it would work right away without having to delete the asset_usage table first.~

See Upgrade instructions above.

Jun 02 '24 13:06 crydotsnake

Current state is, that we might need to implement also the node hierarchy in the AssetUsageProjection, to be able to handle child nodes on deleted nodes.

See: https://neos-project.slack.com/archives/C04PYL8H3/p1718654280746299

Jul 05 '24 08:07 dlubitz

As discussed with @bwaidelich :

We want to try to replace the projection with a CatchUpHook
We just want to track changes to asset-properties.
For the initial state, we need to build the AssetUsage based on the latest node state
- iterating over all workspaces and all nodes (from root to leafs)
After initial node state has been build we handle changes to the AssetUsage by the events of the CatchUpHook
We want to try to extract the "AssetUsage-Comparison-Storage" logic, so we can mostly work with the node read model

PROS:

We can use Projections to gather workspace chains and node hierarchy

CONS:

Heavy job for inital state building (traversing all nodes, in all dimensions and all workspaces)

Jul 05 '24 13:07 dlubitz

@dlubitz thanks for the summary. I'll add some more topics we discussed from the top of my head:

In general, just looking at the state should make a lot of edge cases simpler because it is much easier to reason about. For example: IMO it makes sense to only ever have a single entry in the asset usage table if it is used in the same node property (i.e. aggregate id, dimension space point, property name and asset id are equal) no matter whether the asset was changed an reverted in the meantime.

But there is one special case and that's "materialized nodes" : Whenever a node (property) is changed that respective node is copied in the database. That is done to optimize disk space and performance, but it's not an implementation detail because it means that other changes to that node won't "shine through" from the base workspace any longer as soon as it's changed once – until that change is published or reverted that is.

This can lead to at least two weird situations in relation to asset usages:

Scenario: asset is only used in the child workspace

Node a is created in workspace live with a property x of asset1
- One entry in the asset usage table: workspace: live, node: A, property: x, asset: asset1
A different property of that node is changed in workspace user
- No change to the asset usage table
The property x of node a is set to null (i.e. the asset is removed from the prop)
- The entry from the asset usage table is removed as well

As a graph:

%%{init: { 'theme': 'base', 'gitGraph': {'showBranches': true, 'showCommitLabel':true,'mainBranchName': 'live'}} }%%
gitGraph
    commit id:"prop x: asset1"
    branch user
    commit id:"prop y: foo"
    checkout live
    commit id:"prop x: ~"

Issue

As a result of the above, the workspace user still contains that node a with the asset asset1 used. But it's not contained in the asset usage table any longer.

[!NOTE]
When building up the asset usage table from scratch (i.e. from the content graph state) it would add an entry to the usage table in this case because we can't easily differentiate whether the property was set in the workspace explicitly

Consequence

It at least feels a bit weird, that the asset usage is not tracked in this case (and especially that it is when the usage tabel is built-up from the content graph state). But in reality this probably isn't a problem because: As soon as the live workspace is changed, the user workspace is outdated. So it contains a state that does not reflect "the real world", i.e. it can't be published as is. When the conflict is resolved (i.e. the user workspace is rebased) the asset removal event will be reflected and the asset is no longer in use for good.

[!NOTE]
If, at some point, we allow to resolve conflicts in terms of "keep my change", there will have to be an event that reverts the asset removal – so the asset usage table will be updated correctly

There are some more, similar, scenarios, e.g. when the asset property is not removed but changed to a different asset in the base workspace. But the resulting consequences are mostly similar to the case above.

Consideration

While writing this down, I was wondering whether that initial claim

only ever have a single entry in the asset usage table if it is used in the same node property

really makes sense.. We could also sync the node in the asset usage table whenever it is changed (i.e. every event that would lead to the node being "materialized" if it wasn't yet). Sync as in: copy all usages for that node from the base workspace(s) With that the 2nd step in the scenario above would have a different effect:

Node a is created in workspace live with a property x of asset1
- One entry in the asset usage table: workspace: live, node: A, property: x, asset: asset1
A different property of that node is changed in workspace user
- Add entry to the asset usage table: workspace: user, node: A, property: x, asset: asset1
The property x of node a is set to null (i.e. the asset is removed from the prop)
- The entry from the asset usage table is removed as well

This way the resulting state would be the same for the initial setup and when executed via hook.

[!NOTE]
I still think that we should count only one usage if the asset is the same in the child workspace, but we could filter out the duplicates at read time

Copying all usages for each (initial) change might pose a performance issue though..

One comment regarding

Heavy job for inital state building (traversing all nodes, in all dimensions and all workspaces)

We would only have to look into child workspaces with pending changes. And for those we could even find only potentially affected nodes by iterating through the events of the respective content stream (but that is an optimization that could always add lateron)

Jul 06 '24 09:07 bwaidelich

Closed in favor of https://github.com/neos/neos-development-collection/pull/5258

Sep 23 '24 18:09 dlubitz