addons icon indicating copy to clipboard operation
addons copied to clipboard

[Task]: Store full add-on manifest in the database

Open diox opened this issue 1 year ago • 2 comments

Description

We're frequently asked to find add-ons that use a specific manifest property while filtering the results by some other criteria, like only listed versions from add-ons with more than X users.

We currently can only do that for permissions (through WebextPermission model/webext_permissions table), but it would be useful for all manifest properties, so let's store the full JSON manifest in a JSONField on another model/table for each Version.

A one-off task & process_addons command to trigger it should be included to backfill data (which will take a while...)

Acceptance Criteria

  ### Milestones/checkpoints
  - [ ] Manifest is available for all versions in a json field in a new model/table
  - [ ] Backfill task is working
  - [ ] Data is visible through the admin
  - [ ] Ticket is filed for SRE to expose the table in redash

Checks

  • [X] If the issue is ready to work on, I have removed the "needs:info" label.
  • [X] If I have identified that the work is specific to a repository, I have removed "repository:addons-server" or "repository:addons-frontend"

┆Issue is synchronized with this Jira Task

diox avatar Jul 25 '24 09:07 diox

I wonder if that would be performant enough, crawling through millions of json blobs. Could we denormalize the overall blob into specific table columns for each top level json property, perhaps?

wagnerand avatar Jul 25 '24 09:07 wagnerand

The performance is obviously going to be worse than actual (possibly indexed) columns but I think denormalizing here would be premature. Especially as this is meant for adhoc searches for which taking a bit of extra time is acceptable. Also, as mentioned in the description we already have the webext_permissions available for more specific searches as well.

diox avatar Jul 25 '24 11:07 diox

Couple recent examples of where it could have been useful:

  • Figuring out how many add-ons have an update url
  • Figuring out how many themes have images set

diox avatar Jul 31 '24 15:07 diox

SRE ticket is https://mozilla-hub.atlassian.net/browse/SVCSE-2158

I'll trigger the task on dev & stage tomorrow.

diox avatar Aug 06 '24 16:08 diox

Example using stage data:

>>> FileManifest.objects.count()
809622

>>> FileManifest.objects.filter(
  Q(manifest_data__browser_specific_settings__gecko__update_url__isnull=False) |
  Q(manifest_data__apps__gecko__update_url__isnull=False)).count()
30

>>> str(FileManifest.objects.only('file_id').filter(
  Q(manifest_data__browser_specific_settings__gecko__update_url__isnull=False) |
  Q(manifest_data__apps__gecko__update_url__isnull=False)).query)
'SELECT `files_filemanifest`.`file_id` FROM `files_filemanifest`
WHERE (
JSON_EXTRACT(`files_filemanifest`.`manifest_data`, $."browser_specific_settings"."gecko"."update_url") IS NOT NULL
OR JSON_EXTRACT(`files_filemanifest`.`manifest_data`, $."apps"."gecko"."update_url") IS NOT NULL
)'

JSON_CONTAINS_PATH variant (either are super fast on stage):

>>> FileManifest.objects.filter(
  Q(manifest_data__browser_specific_settings__gecko__has_key='update_url') |
  Q(manifest_data__apps__gecko__has_key='update_url')).count()
30

>>> str(FileManifest.objects.only('file_id').filter(
  Q(manifest_data__browser_specific_settings__gecko__has_key='update_url') |
  Q(manifest_data__apps__gecko__has_key='update_url')).query)
'SELECT `files_filemanifest`.`file_id` FROM `files_filemanifest`
WHERE (
JSON_CONTAINS_PATH(`files_filemanifest`.`manifest_data`, \'one\', $."browser_specific_settings"."gecko"."update_url")
OR JSON_CONTAINS_PATH(`files_filemanifest`.`manifest_data`, \'one\', $."apps"."gecko"."update_url")
)'

diox avatar Aug 09 '24 09:08 diox