teleport icon indicating copy to clipboard operation
teleport copied to clipboard

duplicate session recording uploads result in confusing behavior

Open fspmarshall opened this issue 5 months ago • 0 comments

Under certain circumstances it is possible for multiple separate upload attempts to happen for a given session recording. A recent example of this is https://github.com/gravitational/teleport/pull/45877 and while that was a bug (or at least, sub-optimal behavior), there are also legitimate cases where multiple attempts may occur, such as when a node was shutdown mid upload and then started up again after the initial upload attempt timed out and was completed.

Our recording backends (typically s3 or an s3-compatible service) take a "winner takes all" approach, with the first completed upload for a given session being treated as the official upload. The intent is to make session recordings effectively immutable by ensuring that they can't be superseded by subsequent writes. Our s3 recording backend handles this by always selecting the oldest key version. The file based backend (not recommended for production deployments) does this by explicitly rejecting all subsequent uploads.

In practice, this behavior can end up suppressing information that aught to be displayed to the user. For example, in the case of a partial (or empty in the case of https://github.com/gravitational/teleport/pull/45877) upload being completed first, expected events will appear to be missing. The s3 audit backend doesn't actually prevent the subsequent complete upload from happening, so the full event history can still manually be pulled from s3, but tsh play and the web UI will only display the initial partial recording, which is problematic.

Since duplicate uploads are possible and are likely to remain possible, it would be preferable to have a means of representing duplicate session recordings within the recording API and interfaces. Practically speaking, there are probably plenty of ways this could be achieved (e.g. appending a random suffix to each upload), but a reasonable user interface may be tricky to design. One option might be to default to the current UX but return an error requesting disambiguation when a duplicate is found. Ex:

$ tsh play 997dfe2f-0b3f-46da-bbb2-dc739e822cbd
Multiple recordings found for session "997dfe2f-0b3f-46da-bbb2-dc739e822cbd".  Please select a specific upload with the '--upload-id' option:

UploadID     Created              Size
------------ -------------------- -----
2117ce43f33f 2024-06-01T16:00:34Z 5mb
70fb9582a739 2024-06-01T16:04:02Z 4.2kb

$ tsh play 997dfe2f-0b3f-46da-bbb2-dc739e822cbd --upload-id=70fb9582a739

fspmarshall avatar Aug 27 '24 16:08 fspmarshall