parse-server icon indicating copy to clipboard operation
parse-server copied to clipboard

feature: add GDPR compliance

Open dblythy opened this issue 2 months ago • 11 comments

Pull Request

Issue

Closes: #5378

Approach

Tasks

  • [ ] Add tests
  • [ ] Add changes to documentation (guides, repository pages, code comments)
  • [ ] Add security check
  • [ ] Add new Parse Error codes to Parse JS SDK

Summary by CodeRabbit

Release Notes

  • New Features
    • Added GDPR-compliant audit logging system that automatically tracks user authentication, data operations (create, read, update, delete), ACL modifications, schema changes, and push notifications with daily log rotation, sensitive data masking, IP address tracking, and configurable filtering options.

dblythy avatar Oct 01 '25 11:10 dblythy

🚀 Thanks for opening this pull request!

📝 Walkthrough

Walkthrough

This PR introduces comprehensive GDPR-compliant audit logging for Parse Server. It adds infrastructure to log user authentication, data access, data modifications, ACL changes, schema modifications, and push notifications. The implementation includes a Winston-based daily-rotating file adapter, configurable event filtering, TypeScript-based event types, and integration points across multiple routers to capture relevant operations.

Changes

Cohort / File(s) Change Summary
Documentation & Configuration
GDPR_COMPLIANCE_GUIDE.md, src/Options/Definitions.js, src/Options/index.js, src/Options/docs.js, resources/buildConfigDefinitions.js
Introduces GDPR compliance guide with audit logging patterns, code examples, and organizational guidance. Adds new auditLog configuration option with nested AuditLogOptions, AuditLogFilterOptions, and WinstonFileAuditLogAdapterOptions types for controlling adapter behavior, event filtering, and log retention.
Core Audit Logging Infrastructure
src/Adapters/Logger/AuditLogger.js, src/Adapters/Logger/AuditLogAdapter.js
Implements base audit logger using Winston with daily log rotation and AuditLogAdapter class that routes audit events through the logger with methods for each event type (login, data access/modify, schema/ACL changes).
TypeScript Adapter & Filter Interfaces
src/Adapters/AuditLog/AuditLogAdapterInterface.ts, src/Adapters/AuditLog/AuditLogFilter.ts, src/Adapters/AuditLog/index.ts
Defines strongly-typed audit event interfaces (UserLoginEvent, DataViewEvent, DataCreate/Update/DeleteEvent, ACLModifyEvent, SchemaModifyEvent, PushSendEvent) and AuditLogAdapterInterface abstract contract. Implements AuditLogFilter with multi-stage filtering pipeline for event types, class names, master-key exclusion, and roles.
Winston File Adapter Implementation
src/Adapters/AuditLog/WinstonFileAuditLogAdapter.ts
Concrete implementation of audit log adapter using Winston with daily-rotating file transport, folder creation, and masking of sensitive fields in logged events.
Audit Log Controller
src/Controllers/AuditLogController.ts, src/Controllers/index.js
Introduces AuditLogController extending AdaptableController with public methods for each event type, IP extraction, user context derivation, sensitive data masking, and configurable filtering. Exports getAuditLogController factory function.
Router & Query Integration
src/RestQuery.js, src/RestWrite.js, src/Routers/UsersRouter.js, src/Routers/PushRouter.js, src/Routers/SchemasRouter.js, src/rest.js
Integrates audit logging into data access (RestQuery), write operations (RestCreate/Update/Delete), user authentication (login/loginAs), push sending, schema modifications, and data deletion; includes error handling to prevent audit failures from affecting main operations.
Comprehensive Test Suite
spec/AuditLogAdapter.spec.js, spec/AuditLogController.spec.js, spec/AuditLogSchemas.spec.js, spec/AuditLogging.e2e.spec.js, spec/Auth.spec.js, spec/ParseObject.spec.js, spec/ParseQuery.spec.js, spec/AuditLogFilter.spec.js
End-to-end and unit tests validating adapter initialization, controller behavior, event logging for schema/auth/CRUD/query operations, log file management, masking, filtering, and concurrent operation handling.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/Client
    participant Parse as Parse Server<br/>(Router)
    participant Audit as AuditLogController
    participant Filter as AuditLogFilter
    participant Adapter as WinstonFileAuditLogAdapter
    participant Logger as Winston Logger
    participant FS as File System

    User->>Parse: Login / Data Access / Write
    Parse->>Parse: Process Operation
    Parse->>Audit: log[EventType](params)
    Audit->>Audit: Build AuditEvent
    Audit->>Filter: shouldLog(event)
    Filter->>Filter: Check eventType, class,<br/>masterKey, roles, custom
    Filter-->>Audit: true/false
    alt shouldLog() returns true
        Audit->>Adapter: log[EventType](event)
        Adapter->>Adapter: maskSensitiveData(event)
        Adapter->>Logger: logger.info('audit_event',<br/>auditEntry)
        Logger->>FS: Write to daily rotation file<br/>(parse-server-audit-YYYY-MM-DD.log)
        FS-->>Logger: ✓ Written
        Logger-->>Adapter: ✓ Complete
        Adapter-->>Audit: Promise resolved
        Audit-->>Parse: (fire-and-forget)
    end
    Parse-->>User: Return Response
    Note over Parse: Audit logging errors<br/>do not affect main flow

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Complexity factors:

    • Large scope: ~30 files modified/created across multiple subsystems
    • Heterogeneous changes: New adapter infrastructure, controller logic, integration points across 6+ routers, TypeScript interfaces, filtering pipeline, masking logic
    • High logic density: Multi-stage filtering (event type, class, master-key, roles, custom filter), sensitive data masking, error propagation in fire-and-forget patterns
  • Areas requiring extra attention:

    • AuditLogFilter filtering logic (src/Adapters/AuditLog/AuditLogFilter.ts): Verify correct precedence and behavior across all filter combinations (event type whitelist, include/exclude class filtering, master-key exclusion, role whitelisting/blacklisting, custom filter fail-open semantics)
    • Masking implementation (across AuditLogController and WinstonFileAuditLogAdapter): Confirm sensitive fields (sessionToken, authData, passwords) are consistently masked across all event types
    • Router integration points (UsersRouter.js, RestWrite.js, PushRouter.js, SchemasRouter.js, RestQuery.js): Verify audit logging does not introduce race conditions, exception handling is robust, and fire-and-forget promises do not mask errors in tests
    • Type safety (src/Adapters/AuditLog/*): Ensure TypeScript event interfaces align across adapter, controller, and router usage
    • Test coverage: Validate that e2e tests adequately cover concurrent operations, file rotation edge cases, and filter precedence scenarios

Suggested reviewers

  • @mtrezza

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description includes the required template structure with linked issue #5378 explicitly referenced, but the 'Approach' section explaining the implementation details is empty. Fill in the 'Approach' section with a brief summary of how GDPR compliance is implemented (e.g., audit logging infrastructure, adapter pattern, event types covered).
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feature: add GDPR compliance' clearly summarizes the main change—introducing GDPR audit logging infrastructure.
Linked Issues check ✅ Passed The PR substantially implements all requirements from #5378: logging user login, data access/views, data manipulation (CRUD), schema changes, ACL modifications, and push events via a configurable audit logging adapter with separate file storage.
Out of Scope Changes check ✅ Passed All changes are within scope of GDPR compliance logging: new audit adapters, controllers, filters, type definitions, and integration into existing routers (RestQuery, RestWrite, UsersRouter, etc.) for event capture, plus comprehensive tests and documentation.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

[!WARNING] There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.39.7)
spec/ParseQuery.spec.js

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Oct 01 '25 11:10 coderabbitai[bot]

:white_check_mark: Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
:white_check_mark: Open Source Security 0 0 0 0 0 issues

:computer: Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

parseplatformorg avatar Oct 01 '25 11:10 parseplatformorg

@dblythy, this looks great and something we will use extensively rather than our current home-brewed audit log. My only concern is that the logs are stored on the server, which means they will be lost on each deploy for Docker users. Is there an external storage (S3) approach that would be easy to implement?

rehabguruadm avatar Oct 26 '25 08:10 rehabguruadm

I wonder whether storing in log files serves the purpose. To be efficiently searchable in regular audits it would better be stored in a DB. That also allows to easily set up triggers, automated reports, etc. So maybe there should be an auditLogger adapter interface that allows to plug in adapters to store on S3, in DB, or any other 3rd party system.

mtrezza avatar Oct 26 '25 12:10 mtrezza

I wonder whether storing in log files serves the purpose. To be efficiently searchable in regular audits it would better be stored in a DB. That also allows to easily set up triggers, automated reports, etc. So maybe there should be an auditLogger adapter interface that allows to plug in adapters to store on S3, in DB, or any other 3rd party system.

This sounds great to me. From experience running audit logs for write and delete events, it gets huge really quickly. Currently 10x the size of our largest collection. It would be great to have a Redis-style policy for audit logs so that those older than 30 days are automatically deleted; however, it would need a background job in the current parse setup.

rehabguruadm avatar Oct 26 '25 15:10 rehabguruadm

I think you can add a logger adapter:

https://community.parseplatform.org/t/logging-using-external-services-cloudwatch-datadog-etc/1036/3

dblythy avatar Nov 01 '25 07:11 dblythy

I believe the GDPR auditable information should be logged separately from other server logs; is this going into distinct files? Logging as JSON or CSV is fine I think; then it can be ingested and parsed like with a AWS Firehose pipeline.

mtrezza avatar Nov 02 '25 10:11 mtrezza

@coderabbitai explain the architecture of the audit logging this PR adds. Does it allow to save the GDPR logs into distinct log files? Is the feature and logging output suitable for the industry standard of GDPR auditable logging?

mtrezza avatar Nov 02 '25 10:11 mtrezza

[!TIP] For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the actual implementation files to understand the architecture better

echo "=== AuditLogger.js ==="
cat src/Adapters/Logger/AuditLogger.js

echo -e "\n\n=== AuditLogAdapter.js ==="
cat src/Adapters/Logger/AuditLogAdapter.js

echo -e "\n\n=== AuditLogController.js ==="
cat src/Controllers/AuditLogController.js

echo -e "\n\n=== Options/Definitions.js - auditLog section ==="
grep -A 30 "auditLog" src/Options/Definitions.js | head -40

Length of output: 24771


@mtrezza, I'll explain the audit logging architecture added in this PR:

Architecture Overview

The implementation uses a three-layer architecture:

1. AuditLogger Layer (src/Adapters/Logger/AuditLogger.js)

  • Low-level Winston-based logger with daily rotation using winston-daily-rotate-file
  • Creates distinct audit log files with pattern: parse-server-audit-%DATE%.log (e.g., parse-server-audit-2025-11-02.log)
  • Configuration:
    • auditLogFolder: Directory for audit logs (separate from application logs)
    • datePattern: Rotation pattern (default: YYYY-MM-DD)
    • maxSize: Max file size before rotation (default: 20m)
    • maxFiles: Retention period (default: 14d)
  • JSON-formatted output with timestamps

2. AuditLogAdapter Layer (src/Adapters/Logger/AuditLogAdapter.js)

  • Extends LoggerAdapter and provides domain-specific logging methods:
    • logUserLogin() - Login attempts (success/failure)
    • logDataView() - Query/read operations
    • logDataCreate(), logDataUpdate(), logDataDelete() - Write operations
    • logACLModify() - ACL changes
    • logSchemaModify() - Schema modifications
    • logPushSend() - Push notifications
  • Routes all calls through the audit logger

3. AuditLogController Layer (src/Controllers/AuditLogController.js)

  • Orchestrates logging with context extraction:
    • IP address resolution (X-Forwarded-For, X-Real-IP, fallback to req.ip)
    • User context extraction (userId, sessionToken)
    • Sensitive data masking (password, sessionToken, authData, _hashed_password)
  • Integrated into: RestQuery, RestWrite, UsersRouter, SchemasRouter, PushRouter

Distinct Log Files

Yes, GDPR audit logs are stored in completely separate files:

  • Distinct directory: configured via auditLogFolder
  • Distinct filename pattern: parse-server-audit-*.log (not mixed with application logs)
  • Separate Winston logger instance dedicated to audit events
  • JSON format for easy parsing and ingestion

Industry Standard Suitability

Strengths for GDPR Compliance:

  1. Comprehensive event coverage - All required events logged (login, data access, modifications, ACL/schema changes, push)
  2. Audit trail essentials - userId, IP address, timestamps, success/failure, error messages
  3. Data protection - Sensitive fields masked automatically
  4. Structured format - JSON output for parsing/ingestion (AWS Firehose, log aggregators)
  5. Rotation & retention - Configurable lifecycle management
  6. Immutability - Append-only log files

⚠️ Limitations (Addressing Thread Concerns):

As you and @rehabguruadm correctly identified:

  1. File-based only - No pluggable adapter interface for alternative storage backends

    • Lost on Docker container restarts/redeployments
    • Cannot store to S3, databases, or external SIEM systems
    • Difficult to implement tamper-proof archiving
  2. Searchability - Log files are hard to query at scale

    • No built-in search/filter capabilities
    • Would need external log aggregation tools
    • Cannot easily run compliance reports or automated triggers
  3. No adapter interface - Unlike other Parse Server components, audit logging lacks pluggable backends

    • Your suggestion for an auditLogger adapter interface would be more architecturally sound
    • Would allow: S3 storage, database storage, CloudWatch, Datadog, etc.

Recommendation:

The current implementation provides a solid foundation for GDPR audit logging and meets basic compliance requirements for file-based deployments. However, for production-grade GDPR compliance—especially in containerized or enterprise environments—this PR would benefit from:

  1. Adapter interface pattern (similar to LoggerAdapter, CacheAdapter, etc.)
  2. Support for multiple backends: filesystem (default), S3, MongoDB/PostgreSQL, external SIEM
  3. Retention policies implemented at the adapter level

The current architecture could be refactored to use an AuditLogStorageAdapter interface, making the existing file-based implementation one of several available options.

coderabbitai[bot] avatar Nov 02 '25 10:11 coderabbitai[bot]

The current architecture could be refactored to use an AuditLogStorageAdapter interface, making the existing file-based implementation one of several available options.

That makes sense to me. For example, storing on S3 would allow to store immutably through S3 bucket settings. This brings a higher data confidence level than storing as log files on the instance and waiting for a job to ingest the log files and store them somewhere else where they only then become immutable.

I think we also need some filter settings to define what should be logged, to reduce unnecessary logging and tailor to purpose and jurisdiction.

mtrezza avatar Nov 03 '25 12:11 mtrezza