ckanext-validation icon indicating copy to clipboard operation
ckanext-validation copied to clipboard

Unnecessary validation of unmodified resources in same dataset

Open ThrawnCA opened this issue 5 years ago • 3 comments

Overview

When a resource is modified, all resources in the dataset are validated, even though most of them are unmodified. On datasets with many resources, this can result in a substantial performance problem, especially if the resources are large.

It appears that the after_update function assumes that the presence of "resources" in the data dictionary means that the whole package is being updated at once. However, this is not necessarily the case. If a resource is updated via eg the resource_update API, then this code path will still trigger.


Please preserve this line to notify @amercader (lead of this repository)

ThrawnCA avatar Mar 04 '20 02:03 ThrawnCA

Testing indicates that the package_patch API and the resource_patch API both call the after_update function with the full package dictionary, as does editing a resource via the web interface.

ThrawnCA avatar Mar 04 '20 04:03 ThrawnCA

Ok, so resource_patch and resource_update first call after_update with the package dict, then with the resource dict. However, the first call generates validation jobs for every resource in the package, before the second takes place. There needs to be a way for after_update to detect that it was actually triggered by a resource call.

What if before_update didn't just populate resources_to_validate when a resource does need validation, but always populated it with either True or False, and then after_update checked whether or not resources_to_validate was empty? Empty -> this is a package call, validate everything. Not empty -> this is a resource-based call, only validate resources with True entries in resources_to_validate.

Alternatively, before_update could update a new dict, eg self.packages_to_skip, indicating that a call originates from a resource API and after_update should skip the package.

ThrawnCA avatar Mar 04 '20 05:03 ThrawnCA

This is the reason that 4321b79 broke the TravisCI build; as soon as the plugin implemented IPackageController, it started incorrectly generating multiple validation jobs.

ThrawnCA avatar Mar 04 '20 22:03 ThrawnCA