data icon indicating copy to clipboard operation
data copied to clipboard

Fix ContainsXxx exception false positive

Open PhilippGrashoff opened this issue 6 months ago • 4 comments

This PR starts as a draft to have a proper place for discussion. This PR will fix https://github.com/atk4/ui/issues/1860 on data level. I will add thoughts for discussion here.

PhilippGrashoff avatar May 24 '25 13:05 PhilippGrashoff

Hi, sorry for the late reply.

I totally agree with you @mvorisek. ContainsOne and ContainsMany models should only be accessed using ->ref() - as you would handle any "normal" reference.

To me, the most important question at the moment is if we can "catch" all direct accesses to the field value properly? Meaning any $parentModel->get('someContainsXxxField') and $parentModel->set('someContainsXxxField)? I guess that's what the normalize hook more or less does / should do.

Thinking about it, maybe the proper way to deal with this is have the UI ignore any containsXxx fields, as they are - in the sense of atk4\data - not a field at all. Maybe the problem is not in atk4\data after all.

I will dive deeper into this with a fresh head.

PhilippGrashoff avatar Jun 02 '25 18:06 PhilippGrashoff

To me, the most important question at the moment is if we can "catch" all direct accesses to the field value properly? Meaning any $parentModel->get('someContainsXxxField') and $parentModel->set('someContainsXxxField)? I guess that's what the normalize hook more or less does / should do.

depends on usecase, but yes, normalize hook is always called - https://github.com/atk4/data/blob/6.0.0/src/Field.php#L134

Thinking about it, maybe the proper way to deal with this is have the UI ignore any containsXxx fields, as they are - in the sense of atk4\data - not a field at all. Maybe the problem is not in atk4\data after all.

UI should ignore them already - https://github.com/atk4/data/blob/6.0.0/src/Reference/ContainsBase.php#L21

I will dive deeper into this with a fresh head.

👍 let's start with use/test cases

(and small nit, I prefer all discussion in threads, as the main issue discussion must be linear to be readable, ie. it is limited to one thread only)

mvorisek avatar Jun 02 '25 18:06 mvorisek

Hi, after reading quite a bit through the relevant code, I do not see a better place currently to address this than the current solution using the normalize hook. I do not think its perfect (but good enough), as

  • This code is run for each field. So if a model has 49 normal fields and one ContainsXxx field, the code is executed 50 times
  • In theory, we should directly try to "forbid" calling ->set() on this field (and ->get(), too). Using the normalize hook is only an indirect way to achieve this. However, I do not see a sensible way to do this currently - it probably would also cause atk4\data internal problems currently. Maybe when fields are refactored into separate classes (a separate class for each field type like Field\String, Field\Text, Field\Integer and so on), this could be addressed more appropriately.
  • regarding tests, the existing unit tests already cover ->set() for both ContainsOne and ContainsMany.

I will dig into why in atk4\ui the normalize hook is called when using a Model with a ContainsXxx field in a atk4\ui Table. Seems the problem is located there.

Regarding atk4\data, I'd update this PR to improve the comment so it's clearer that ->get() and ->set() should never be used. Then, when users run into this Exception, it's clearer where the problem does come from. WDYT?

And regarding your nit: With "threads" you mean conversations which can be resolved, like the one you added above?

PhilippGrashoff avatar Jun 03 '25 18:06 PhilippGrashoff

yes

mvorisek avatar Jun 03 '25 19:06 mvorisek

Hi, sorry for leaving this stale for so long - I know the reason now. I think there is a use case where ->set('some_contains_many_field', $containsManyData) is sensible: caching!

See this pseudo-code:

/* 
SomeExpensiveModel containsMany SomeSubModels. To improve performance, the values of SomeExpensiveModel entities are cached, e.g. in Redis. 
Lets say Entity id = 34 contain 10 SomeSubModel entities
*/
$someExpensiveEntity = (new SomeExpensiveModel($somePersistence, ['readOnly' => true]))->createEntity();
$dataOfEntity34 = $redisCache->getData(SomeExpensiveModel::class, 34);
$someExpensiveEntity->setMulti($dataOfEntity34); //here, ->set() is used the set the data of the 10 SomeSubModel entities contained in SomeExpensiveModel entity 34. The data of the SomeSubModel Entities is an array only, not 10 Models. The objects are only created if $someExpensiveEntity->ref('some_sub_model_field')->... is called.


/*
if entity 34 contains 10 SomeSubModels entities, doing this without ->set() would be very inefficient. Then, something like this would need to happen:
*/
$someExpensiveEntity = (new SomeExpensiveModel($somePersistence, ['readOnly' => true]))->createEntity();
$dataOfEntity34 = $redisCache->getData(SomeExpensiveModel::class, 34);
$dataWithoutContainsMany = //strip containsMany data from $dataOfEntity34;
$someExpensiveEntity->setMulti($dataWithoutContainsMany);
$containsManyData = $dataOfEntity34['some_sub_model_field'];
foreach($containsManyData as $subModelEntityData) {
    $subModelEntity = $someExpensiveEntity->ref('some_sub_model_field')->creatyEntity();
    $subModelEntity->set($subModelEntityData);
}

/*
This would create 10 additional objects, only to add the SubModel Data to the entity. This would definitely more than undo the performance improvement the caching should bring.
*/
``

WDYT about this point of view?

PhilippGrashoff avatar Jul 26 '25 07:07 PhilippGrashoff

@PhilippGrashoff getting this decided right it takes efford, at least I spent a few hours thinking on this before and yes, it is a problem with me when you reply to this topic every while.

I think I have already said everything above.

If you want to pursue this topic please present enough efford with something actionable and then please let's finish this topic on asap basis together.

Thank you!

mvorisek avatar Jul 26 '25 08:07 mvorisek

@mvorisek the main question regarding this my mind currently is thinking about: Is a special case like the caching example reason enough to alter the current logic (either making it configurable or even ditching it) or not? A thought that just came up: The special caching case mentioned above should also be handable by directly setting the cached data to Model::data array, bypassing Model::set(). Some logic of set() would then need to be added manually to this workaround however (typecasting mainly).

WDYT? Handle a special case like setting entity values from cache differently, or should these special cases be allowed to use ->set() on ContainsXXX fields?

If you like, we can have a Teams/Google Meet etc. call about this, I'd be available today.

PhilippGrashoff avatar Jul 26 '25 11:07 PhilippGrashoff

I have already mentioned this.

If you want to set the data as string without triggering any logic of the contained models, you must declare the field as a string and then you can set it as a string.

If set of the raw/field value should be allowed on inlined reference (contained models), changes (add/update/delete) must be detected and executed on the contained model. This is hard requirement in order to enforce model/data integrity.

There will be no DB roundtrip when performing these changes on the contained model as the data are in-memory and flushed only once the main entity is updated. Thus this will be very fast even with tens of thousands of cointaned entites.

To recap:

There is an internal representation of the contained entities. With json type, it is json array of db-typecasted data. In theory, the array can be serialized also using serialize instead of json_encode.

To the expected implementation:

When Model::set('field', 'serialized data') is called, these data are unserialized. Based on PK, the new data are decided either as new or updating existing contained data. Missing data are considered as removed and remove hooks are called (using contained Model::delete()). For new/updated data hooks are called as well (using contained Model::update()).

Does this make sense to you and would you like to code it?

(IIRC one typical in-memory update takes about 50 us, so 20k updates per second, I hope this is fast enough, if not, we should still have the basic impl. first like outlined above and then discuss how to improve the perfomance :) )

mvorisek avatar Jul 26 '25 11:07 mvorisek

Hi,

I am aware of the internal workings of containsMany, thx for explaining again. And I agree that when setting the serialized data is difficult to implement if business logic is to be maintained fully. So, in general, I think the current implementation forbidding a direct ->set() on a containsOne/containsMany field is correct. I will update this PR with some clearer code comments so others hitting this Exception know where it's coming from. The UI issue linked on top should be tackled in UI.

As for the special case of caching: I think if some the serialized containsMany data is set directly to a containsMany field, this must happen in a read-only setup - so business logic on create, update and delete actions never needs to happen anyway. I will hopefully soon implement this (using Redis, storing the serialized entity data there - but realistically it will take half a year at least). In my application there is lots of nearly-static master data which is used for calculations very often. The one Model used most even has a nested containsMany: TourType containsMany PriceModels containsMany DateRanges. From my current POV, setting cached data to an entity is best implemented completely bypassing ->set() (for the containsMany field) and setting the data directly to the Model::data array. Then, the current implementation does not need to be softened.

PhilippGrashoff avatar Jul 27 '25 10:07 PhilippGrashoff

The topic should be analysed for all usecases.

For example to support comparing such inlined data. Any field/column should support === operator.

I would say this is currently not easily possible but the same for non-inlined references.

With this said I would not fully reject the idea allowing Model::set() but the impl. will face challenges. I incline to support it, but if we hit some too hard or too hack impl., we might be better to keep it unsupported.

mvorisek avatar Jul 27 '25 11:07 mvorisek

Getters were fixed in https://github.com/atk4/data/pull/1269.

Since https://github.com/atk4/data/pull/1270 setting any value throws, incl. null.

mvorisek avatar Oct 11 '25 16:10 mvorisek