openverse-api icon indicating copy to clipboard operation
openverse-api copied to clipboard

Remove legacy `tags_list` field

Open obulat opened this issue 3 years ago • 3 comments

Problem

The media models use an unnecessary legacy tags_list field.

Description

When the machine-generated tags were added, the data model for them changed from ~~simple array of strings~~ a list of ForeignKeys to a separate ImageTags table to an array of objects with properties such as provider and accuracy (only for machine-generated tags). The tags_list property was deprecated, but not removed from the database: https://github.com/cc-archive/cccatalog-api/pull/182

We should remove it.

Alternatives

We could also convert all tags to use simple string array since we are not really using machine-generated tags. But I think we will want to use them in the future, and removing them now to re-add later is an unnecessary complication.

Additional context

Implementation

  • [ ] 🙋 I would be interested in implementing this feature.

obulat avatar May 19 '22 14:05 obulat

If we drop this from the Django models, will this need any change in the catalog or the ingestion server?

dhruvkb avatar May 19 '22 19:05 dhruvkb

The catalog will not need to change anything because the tags column in the catalog is not changing. But the ingestion server will probably need to be updated.

obulat avatar May 20 '22 05:05 obulat

Since the ingestion server makes no explicit reference to the columns and picks the overlapping ones, removing it from the API DB would be more than enough for the ingestion server to not copy it. I think this can be marked as ready for work then.

dhruvkb avatar May 20 '22 09:05 dhruvkb