cvat icon indicating copy to clipboard operation
cvat copied to clipboard

Per segment chunks

Open zhiltsov-max opened this issue 6 months ago • 8 comments

Motivation and context

  • Changed chunk generation from per-task chunks to per-segment chunks
  • Fixed a memory leak in video reading on the server side (only in media_extractors, so there are several more left)
  • Fixed a potential hang in import worker or the server process on process shutdown
  • Disabled multithreading in video reading in endpoints (not in static chunk generation)
  • Refactored static chunk generation code (moved after job creation)
  • Refactored various server internal APIs for frame retrieval
  • Updated UI logic to access chunks
  • Added a new server configuration option CVAT_ALLOW_STATIC_CACHE (boolean) to enable and disable static cache support. The option is disabled by default (it's changed from the previous behavior)
  • Added tests for the changes made
  • Added missing original chunk type field in job responses
  • Fixed invalid kvrocks cleanup in tests for Helm deployment

When this update is applied to the server, there will be a data storage setting migration for the tasks. Existing tasks using static chunks (task.data.storage_method == FILE_SYSTEM) will be switched to the dynamic cache (i.e. to == CACHE)). The remaining files should be removed manually, there will be a list of such tasks in the migration log file.

After this update, you'll have an option to enable or disable static cache use during task creation. This allows, in particular, prohibit new tasks using the static cache. With this option, any tasks using static cache will use the dynamic cache instead on data access.

User-observable changes:

  • Job chunk ids now start from 0 for each job instead of using parent task ids
  • The use_cache = false or storage_method = filesystem parameters in task creation can be ignored by the server
  • Task chunk access may be slower for some chunks (particularly, for tasks with overlap configured, for chunks on segment boundaries, and for tasks previously using static chunks)
  • The last chunk in a job will contain only the frames from the current job, even if there are more frames in the task

How has this been tested?

Checklist

  • [ ] I submit my changes into the develop branch
  • [ ] I have created a changelog fragment
  • [ ] I have updated the documentation accordingly
  • [ ] I have added tests to cover my changes
  • [ ] I have linked related issues (see GitHub docs)
  • [ ] I have increased versions of npm packages if it is necessary (cvat-canvas, cvat-core, cvat-data and cvat-ui)

License

  • [ ] I submit my code changes under the same MIT License that covers the project. Feel free to contact the maintainers if that's a concern.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a new server setting to disable media chunks on the local filesystem.
    • Enhanced frame prefetching with a startFrame parameter for improved chunk calculations.
    • Added a new property, data_original_chunk_type, for enhanced job differentiation in the metadata.
  • Bug Fixes

    • Resolved memory management issues to prevent leaks during video processing.
    • Corrected naming inconsistencies related to the prefetchAnalyzer.
  • Documentation

    • Included configuration for code formatting tools to ensure consistent code quality across the project.
  • Refactor

    • Restructured classes and methods for improved clarity and maintainability, particularly in media handling and task processing.
  • Chores

    • Updated formatting scripts to include additional directories for automated code formatting.

zhiltsov-max avatar Aug 07 '24 12:08 zhiltsov-max