--log-format AWSS3: add support for having "optional" '%h' ('%r' and possibly others) for some records
We have "trailing delete" policy which expires older versions of keys. Log contains entries like
SENSORED dandiarchive [01/Mar/2025:11:30:56 +0000] - AmazonS3 SENSORED S3.EXPIRE.OBJECT dandisets/001350/draft/dandiset.yaml "-" - - - 1865 - - "-" "-" Y7ufX4ilpyCWWkwfu.KaaRtIEBXLX.XZ jlKklPxsSNqxTDFjCNPWYnNiBvD6ud/L6CROWZJyR5oeezkP/weNaHKYicoAToCtab6OzCO2p8k= - - - - - - -
SENSORED dandiarchive [01/Mar/2025:11:30:56 +0000] - AmazonS3 SENSORED S3.EXPIRE.OBJECT dandisets/001350/draft/dandiset.yaml "-" - - - 1865 - - "-" "-" Yn9IACqj91TzFo3JZ.njOgVZveshWU00 mR0vFWaVQKr2oGtdtQuwmjEoZWJjJli4VouqvqBBHS8XVgV1CWBdGAdTX9vcYw5c3QAwQCWD4m8= - - - - - - -
which as you see lacks various parts of the record.
edit 1: on another file, we saw other types of deficient entries (which triggered filing #2818) where it was for making '%r' optional. And apparently there were LOTS of such log entries -- likely by bad actors. Looking like:
SENSORED dandiarchive [09/Feb/2025:23:01:58 +0000] 3.148.XXX.YYY arn:aws:sts::SENSORED:assumed-role/ecstask/60dc52cf2fff40afa38cfd795a6b11e3 6Z9TG9F992TQRS43 REST.COPY.OBJECT_GET zarr/8fe9f51a-db58-45f6-bba8-8b01eeb4f08e/0/0/0/14/13/343 - 200 - - 936 - - - - - UKqVQ5m25NkwzFxkvINtgoHrdD1iElZUOEb9ncRDHayAlAEdr+NrGE4MF31rLuyvgkZPAY1TAdg= SigV4 TLS_AES_128_GCM_SHA256 AuthHeader sensored.s3.us-east-2.amazonaws.com TLSv1.3 - -
Thanks for sharing that. So, essentially, you're suggesting making some or maybe all fields optional in certain cases? The reason I ask is that we rely on the date for many behind-the-scenes processes, but we also need the host for unique visitor tracking. I'm curious how we should handle cases like this. thoughts?
- yes, I am thinking of making them "optional".
- Ultimately I think there could/should be some kind of "Other" or "Unparseable" value permitted for them to at least show the count(s) on such number of records. Some of them, I am afraid, might be indicative of "bad actors" operations (no agent string provided etc), so worth knowing about, so should not be entirely ignored.