EvalAI icon indicating copy to clipboard operation
EvalAI copied to clipboard

[Fix #4682] Implement Log retention policies

Open Zahed-Riyaz opened this issue 5 months ago • 2 comments

The AWS retention management system implemented for #4682 provides automated log retention policies for CloudWatch logs and submission artifact cleanup. It consists of:

Core Components

Backend Models

Challenge Model Fields

  • retention_policy_consent: Boolean flag indicating host consent
  • retention_policy_consent_date: When consent was provided
  • retention_policy_consent_by: User who provided consent
  • retention_policy_notes: Optional notes about retention policy
  • log_retention_days_override: Admin override for retention period

Submission Model Fields

  • retention_eligible_date: When submission becomes eligible for deletion
  • is_artifact_deleted: Flag indicating if artifacts were deleted
  • artifact_deletion_date: Timestamp of deletion
  • retention_policy_applied: Description of applied policy
  • retention_override_reason: Reason for any overrides

API Endpoints

Retention Consent Management

  • POST /challenges/{challenge_pk}/retention-consent/ - Provide consent
  • GET /challenges/{challenge_pk}/retention-consent-status/ - Get consent status
  • POST /challenges/{challenge_pk}/update-retention-consent/ - Update consent
  • GET /challenges/{challenge_pk}/retention-info/ - Get comprehensive retention info

Frontend Implementation

Challenge Controller (challengeCtrl.js)

  • fetchRetentionConsentStatus(): Loads current consent status
  • toggleRetentionConsent(): Shows confirmation dialog and handles consent toggle
  • actuallyToggleRetentionConsent(): Makes API call to update consent

UI Components (as shown in video)

  • Toggle switch for consent management
  • Status display showing consent state
  • Confirmation dialogs for consent actions
  • Loading states and error handling

https://github.com/user-attachments/assets/51045526-df8e-450a-95bf-ca726bd9b049

Celery Tasks

Scheduled Tasks (Celery Beat)

CELERY_BEAT_SCHEDULE = {
    "cleanup-expired-submission-artifacts": {
        "task": "challenges.aws_utils.cleanup_expired_submission_artifacts",
        "schedule": crontab(hour=2, minute=0, day_of_month=1),  # Monthly on 1st at 2 AM UTC
    },
    "weekly-retention-notifications-and-consent-log": {
        "task": "challenges.aws_utils.weekly_retention_notifications_and_consent_log",
        "schedule": crontab(hour=10, minute=0, day_of_week=1),  # Weekly on Mondays at 10 AM UTC
    },
    "update-submission-retention-dates": {
        "task": "challenges.aws_utils.update_submission_retention_dates",
        "schedule": crontab(hour=1, minute=0, day_of_week=0),  # Weekly on Sundays at 1 AM UTC
    },
}

cleanup_expired_submission_artifacts()

  • Runs monthly on the 1st at 2 AM UTC
  • Finds submissions with retention_eligible_date <= now()
  • Deletes submission files from storage
  • Updates is_artifact_deleted flag

weekly_retention_notifications_and_consent_log()

  • Runs weekly on Mondays at 10 AM UTC
  • Sends warning emails for submissions expiring in 14 days
  • Logs recent consent changes for audit purposes

update_submission_retention_dates()

  • Runs weekly on Sundays at 1 AM UTC
  • Updates retention dates for submissions based on current challenge settings
  • Handles changes in challenge phase end dates

AWS Integration

CloudWatch Log Retention

  • set_cloudwatch_log_retention(): Sets CloudWatch log retention policy
  • Requires host consent before applying retention policies
  • Default: 30 days after challenge end date
  • Admin can override with log_retention_days_override

Automatic Triggers

  • Challenge approval: Updates log retention
  • Worker restart: Updates log retention
  • Task definition registration: Updates log retention

Signals and Automation

Django Signals

  • update_submission_retention_on_phase_change: Updates retention dates when phase changes
  • set_submission_retention_on_create: Sets initial retention date for new submissions

Retention Calculation

  • Based on challenge phase end date
  • Only applies to non-public phases
  • Requires host consent
  • Default: 30 days after phase end

User Consent Flow :

  1. Host Access: Only challenge hosts can provide consent
  2. Consent Dialog: Frontend shows confirmation dialog explaining implications
  3. API Call: Consent is recorded via API with optional notes
  4. Automatic Application: Once consent is given, retention policies are automatically applied
  5. Withdrawal: Hosts can withdraw consent at any time

Data Safety :

  • No Consent = No Deletion: Without consent, data is retained indefinitely
  • Warning Notifications: Hosts receive 14-day advance warnings
  • Audit Trail: All consent changes are logged with timestamps
  • Admin Override: Admins can set custom retention periods

manage_retention.py Script

Overview

A command-line utility for managing retention policies and performing cleanup operations.

Usage

docker-compose exec django python scripts/manage_retention.py <command> [options]

Commands

cleanup [--dry-run]

Purpose: Clean up expired submission artifacts

Options:

  • --dry-run: Show what would be cleaned without actually deleting

Example:

# Perform actual cleanup
docker-compose exec django python scripts/manage_retention.py cleanup

# Preview what would be cleaned
docker-compose exec django python scripts/manage_retention.py cleanup --dry-run

Functionality:

  • Triggers the cleanup_expired_submission_artifacts Celery task
  • Returns task ID for monitoring

status [--challenge-id <id>]

Purpose: Show retention status for challenges

Options:

  • --challenge-id <id>: Show status for specific challenge

Example:

# Show overall system status
docker-compose exec django python scripts/manage_retention.py status

# Show status for specific challenge
docker-compose exec django python scripts/manage_retention.py status --challenge-id 123

Output:

  • Overall: Number of challenges with consent, total submissions, eligible for cleanup
  • Specific challenge: Consent status, consent details, submission counts

set-retention <challenge_id> [--days <days>]

Purpose: Set CloudWatch log retention for a challenge

Parameters:

  • challenge_id: ID of the challenge
  • --days <days>: Optional custom retention period

Example:

# Set default retention (30 days)
docker-compose exec django python scripts/manage_retention.py set-retention 123

# Set custom retention (60 days)
docker-compose exec django python scripts/manage_retention.py set-retention 123 --days 60

Functionality:

  • Requires host consent before applying
  • Sets CloudWatch log retention policy
  • Returns success/error status

consent <challenge_id> <username>

Purpose: Record retention consent for a challenge

Parameters:

  • challenge_id: ID of the challenge
  • username: Username of the person providing consent

Example:

docker-compose exec django python scripts/manage_retention.py consent 123 john_doe

Functionality:

  • Records consent in the database
  • Updates challenge model with consent details
  • Enables retention policies for the challenge

Zahed-Riyaz avatar Jun 30 '25 22:06 Zahed-Riyaz

Codecov Report

:x: Patch coverage is 48.07692% with 27 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 76.53%. Comparing base (96968d6) to head (2b1baa1). :warning: Report is 1233 commits behind head on master.

Files with missing lines Patch % Lines
frontend/src/js/controllers/challengeCtrl.js 48.07% 27 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4712      +/-   ##
==========================================
+ Coverage   72.93%   76.53%   +3.59%     
==========================================
  Files          83       21      -62     
  Lines        5368     3660    -1708     
==========================================
- Hits         3915     2801    -1114     
+ Misses       1453      859     -594     
Files with missing lines Coverage Δ
frontend/src/js/controllers/challengeCtrl.js 60.88% <48.07%> (-12.82%) :arrow_down:

... and 74 files with indirect coverage changes

Files with missing lines Coverage Δ
frontend/src/js/controllers/challengeCtrl.js 60.88% <48.07%> (-12.82%) :arrow_down:

... and 74 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 78eeeb2...2b1baa1. Read the comment docs.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Jul 12 '25 10:07 codecov[bot]

@RishabhJain2018 this PR is ready for review

Zahed-Riyaz avatar Jul 22 '25 18:07 Zahed-Riyaz