FluidFramework icon indicating copy to clipboard operation
FluidFramework copied to clipboard

Add session-level metrics for NexusSessionResult telemetry tracking

Open Copilot opened this issue 4 months ago • 2 comments

Problem

We needed session-level metrics in LumberEventName.NexusSessionResult telemetry to track the total number of operations and signals emitted by all clients during a collaboration session. The existing implementation only provided individual client metrics without session-level aggregation.

Solution

This PR adds session-level counters that aggregate operations and signals across all clients in a session and includes them in NexusSessionResult telemetry when handleClientSessionTimeout is called.

Key Changes

  • Extended session telemetry properties: Added sessionOpCount and sessionSignalCount to ICollaborationSessionTelemetryProperties
  • Session-level aggregation: Modified CollaborationSessionTracker to accumulate client metrics when clients disconnect
  • NexusSessionResult integration: Enhanced handleClientSessionTimeout to include session counts using CommonProperties enum values
  • Efficient Redis usage: Session metrics are updated only on client disconnect, not per operation/signal
  • Memory management: Automatic cleanup with session lifecycle

Implementation Flow

  1. Individual clients track ops/signals in nexus layer (sessionOpCountMap/sessionSignalCountMap)
  2. On disconnect, client counts are passed to endClientSession()
  3. Session tracker accumulates counts in session telemetry properties
  4. When session times out, handleClientSessionTimeout includes aggregated counts in NexusSessionResult

Example Output

When a collaboration session ends, NexusSessionResult telemetry will include:

{
  "documentId": "doc-456",
  "tenantId": "tenant-789",
  "sessionOpCount": 42,
  "sessionSignalCount": 15,
  "durationInMs": 125000,
  "maxConcurrentClients": 3
}

This provides precise visibility into total collaboration activity per session, enabling better monitoring and analytics while maintaining performance by avoiding frequent Redis updates.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot avatar Sep 02 '25 21:09 Copilot

@copilot

The session I mean is LumberEventName.NexusSessionResult. The file context for this change should involve to following files and others might related.:

  • packages/services/src/sessionTracker.ts
  • packages/lambdas/src/nexus/index.ts

When handleClientSessionTimeout get called, I want the ops and signal counters to be recoreded along with other metrics properties of LumberEventName.NexusSessionResult. I am pretty sure you don't need change the server/routerlicious/packages/lambdas/src/utils/telemetryHelper.ts file.

We want to keep a running tally to increase the counter for event handler "submitOp" and "submitSignal" for a given socket connection, then add that to the session tally on disconnect or at regular intervals. Importantly, we don't want to update the Redis session data for every Signal. Especially if the Presence package becomes more heavily utilized, we would see too high of a Redis update increase. Also, there would be a high chance for collisions, and we'd want to use Redis INCR in that case

zhangxin511 avatar Sep 03 '25 14:09 zhangxin511

@copilot

The session I mean is LumberEventName.NexusSessionResult. The file context for this change should involve to following files and others might related.: ...

I've implemented the session-level metrics integration as requested. The changes ensure that when handleClientSessionTimeout is called, the LumberEventName.NexusSessionResult telemetry will include the accumulated sessionOpCount and sessionSignalCount from all clients in the session.

Key changes made:

  • Extended ICollaborationSessionTelemetryProperties to include session-level counts
  • Modified session tracker to accumulate client metrics when endClientSession is called
  • Updated disconnect logic to pass individual client counts to the session tracker
  • Session metrics are stored in Redis only on disconnect, not per operation/signal (as requested)
  • Reverted unneeded telemetryHelper.ts changes per your feedback

The implementation tracks ops/signals at the nexus layer per client, then aggregates them at the session level when clients disconnect. When session timeout occurs, these counts are automatically included in the NexusSessionResult telemetry via the session's telemetry properties.

Commit: c848bd4

Copilot avatar Sep 03 '25 14:09 Copilot