tubesync icon indicating copy to clipboard operation
tubesync copied to clipboard

Best way to adjust or lightweight metadata in 'media_sync' table

Open woozu-shin opened this issue 1 year ago • 1 comments

Version

  • 0.13.3

Abstract

  • The more videos there are, the larger the media_sync table becomes.
  • The reason for the bloat is that each metadata json is hundreds of KB.
  • It need a way to solve this.

Detail

  • I am syncing a source with about 1500 videos.
  • Each sync_media record has a data size of approximately 500KB.
  • The reason is because of the bloat of metadata, and a way to reduce this is needed.
  • The sqlite file reached 700MB, and CPU usage and memory usage were observed to be high due to the overhead caused by - reading/dumping the huge metadata json.

AS-IS

tubesync_1_huge_db

TO-BE

  • There are many unverified and unworked Mock parts to create P/R.
  • I am attaching a sample of my work locally. tubesync_2_redueced

Sample Source

  • models.py
class Source(models.Model):
    ...
    LIGHTWEIGHT_METADATA_TYPE_RAW = 'RAW'
    LIGHTWEIGHT_METADATA_TYPE_FEATHER = 'FEATHER'
    LIGHTWEIGHT_METADATA_TYPES = (LIGHTWEIGHT_METADATA_TYPE_RAW, LIGHTWEIGHT_METADATA_TYPE_FEATHER)
    LIGHTWEIGHT_METADATA_TYPE_CHOICES = (
        (LIGHTWEIGHT_METADATA_TYPE_RAW, _("(LARGE) Save raw metadata")),
        (LIGHTWEIGHT_METADATA_TYPE_FEATHER, _("(TINY) if the capacity is large, Treeshake it event if it is in use")),
    )

    lightweight_metadata = models.CharField(
        _('lightweight metadata'),
        max_length=20,
        default=LIGHTWEIGHT_METADATA_TYPE_RAW,
        choices=LIGHTWEIGHT_METADATA_TYPE_CHOICES,
        help_text=_('Lightweight metadata')
    )
  • tasks.py
        if source.lightweight_metadata == Source.LIGHTWEIGHT_METADATA_TYPE_FEATHER:
            del media.metadata["formats"]
            del media.metadata["thumbnails"]
            del media.metadata["automatic_captions"]
            del media.metadata["requested_formats"]
            del media.metadata["heatmap"]

Sample View

  • Add/Edit source image

  • Media item view (one of the media details) image

woozu-shin avatar Feb 21 '24 07:02 woozu-shin

If you delete formats and thumbnails from the metadata then thumbnails can't be downloaded and downloading media won't work as the media format can't be evaluated. This occurs on model save at the moment to determine if an item can be downloaded when there's a match for the requested format. While you may want to ignore the thumbnails, currently the formats (which get refreshed as and when the metadata is updated) are required.

Over the years I've had a good look at the large metadata myself, probably the most sensible may be to move it out of the database and store them as msgpack'd blobs on disk in the config dir or similar. There isn't much you can truncate from the metadata without losing functionality. You can save 5-10% but that never really seemed that worth it.

meeb avatar Feb 21 '24 08:02 meeb

There isn't a sane way to reduce the metadata storage (it'll need to be in the SQLite database or stored per-media-item on disk) without significantly reducing generally used functionality so while it is massive and likely needs some work I'll close this as wont-fix for now. Thanks for the issue and analysis!

meeb avatar Aug 03 '24 10:08 meeb