Proposal - Reserved properties: distinct_id, timestamp
Is your feature request related to a problem?
We have several inconsistencies in the system around special event properties. I'd like to clarify and document the state, and propose a solution.
Problem 1
When displaying each event, we wrongly show that it comes with an event property $timestamp:

It's actually not a property, but an attribute of an event, and without the dollar:

These are at the same level: event, distinct_id, timestamp, properties
You can not search for the $timestamp property in the interface, because it doesn't exist:

This is causing problems for users, who think it's a real property, and causing us to rush solutions to bandage this.
- Delay sending an event with a Timestamp set (we tried
time,timestamp,$timeand$timestampfields)- Event time is set to event reception time and not Timestamp field
Problem 2
We show a property $time, which unclearly says it's a client-side timestamp of an event. To find it, click this button below the event:

And here it is:

It's a true property:

However it's not set on all events, even if you only use the JS integration:

It's definitely not set if you use any of the other integrations.
Problem 3
We have another top-level event attribute, distinct_id that's not exposed in the interface. Here's a longer issue about it: https://github.com/PostHog/posthog/issues/7810
In short, for app we can search for a distinct_id property on an user:

... but this is not shown as an user property in the person properties list, and when I search for "distinct_id" is set vs is not set, I get a roughly comparable set of results, so I can't trust it at all.
Describe the solution you'd like
We have event properties and person properties, but that's not all. We also have what I'm now calling "event attributes", which are the higher level fields that flow through kafka to ingestion:
export interface PluginEvent {
distinct_id: string
ip: string | null
site_url: string
team_id: number
now: string
event: string
sent_at?: string
properties?: Properties
timestamp?: string
offset?: number
$set?: Properties
$set_once?: Properties
kafka_offset?: string
uuid?: string
}
I would like to directly filter by the distinct_id and timestamp attributes on an event, without going through the properties JSON blob.
Having thought through the alternatives, my suggestion is we officially designate distinct_id and timestamp as reserved properties that reflect the true event attributes.
This means:
-
If you make a request
posthog.capture("bla", { timestamp: $value, distinct_id: $otherValue, ...otherProps }), you will override the event attributes with whatever you passed in. This is exactly how the plugin server's posthog.capture already works. -
We always include
timestampanddistinct_idin the taxonomic filter's list of event properties. Sort order to be determined. -
If you filter by an event property called
timestampordistinct_id, we actually filter by the attribute instead. -
Clean up all the inconsistencies in the interface, clearly document these properties, and go through all integrations to make sure we're sending data the same way across the board.
Describe alternatives you've considered
-
Instead of showing these reserved properties in the taxonomic filter's event properties list, the idea of a custom "event metadata" pill was floated, but after a discussion in the team, this felt like delegating complexity down to the user instead of solving it ourselves. Why would there be two lists? From the user's perspective, they're all properties/attributes/data/choose-your-word of the event.
-
The idea to reserve
$timestampand$distinct_id(with the$) doesn't make sense, since we don't actually use this notation anywhere. Why would we reserve a property$distinct_id, to have it write to an attributedistinct_id, so we could query it again with$distinct_id. When and where would we explain the discrepancy to (power) users? We'd have more work with e.g. upgrading the plugin server for no real reason. Why would only two$dollarVariable-s be reserved and the rest not? Too many questions.
Additional context
This is still a proposal, so please let me know if it's the right path, or you suggest some alternative way to look at it.
Tagging a few folks at random: @Twixes @pauldambra @EDsCODE @macobo @timgl @paolodamico
Thanks for writing up the discussion!
I just found another while trying to query the property created_at in the PropertyFilter
SELECT uuid,
event,
timestamp,
team_id,
distinct_id,
elements_chain,
created_at,
trim(BOTH '"' FROM JSONExtractRaw(properties, 'created_at')),
parseDateTimeBestEffortOrNull(trim(BOTH '"' FROM JSONExtractRaw(properties, 'created_at'))),
parseDateTimeBestEffortOrNull(substring(trim(BOTH '"' FROM JSONExtractRaw(properties, 'created_at')), 1, 10))
FROM events
WHERE team_id = 2
AND timestamp > '2022-01-25 09:58:33.150000'
AND timestamp < '2022-01-25 10:03:02.613100'
ORDER BY toDate(timestamp) DESC, timestamp DESC
LIMIT 101
This shows that while the Property Filter lets me select created_at as if it is a property. It's not set on (all?) events but is an event attribute
So, even though I can check the property is present in the UI. The query returns an empty set
I like this, thanks for taking the time!
Regarding implementation, I outlined one possible solution on how to do this in queries here. The other half of this is how to handle it on the data taxonomy side, which currently 100% relies on the plugin server/rows in db.
@macobo this already seems to be a factory for property filtering https://github.com/PostHog/posthog/blob/master/ee/clickhouse/models/property.py#L344
I was considering extending it to catch these reserved words
Or catching it here https://github.com/PostHog/posthog/blob/master/ee/clickhouse/models/property.py#L164 ti limit blast radius (if necesssary)
Assuming this should cover the event column too?
Then these two could be collapsed
@pauldambra We could move towards that, but it'll require some thought and design.
The current separation of "object" + "properties" actually brings clarity, which would be lost if we merge the event type into the filters. I'd then like the insights pages to work in a similar way, or at least be reasonably sure we could make it work well. However this is the best I could mock:

with a great looking filter:

Which I think isn't better than having two fields.
In this slack thread a user asks for the ability to filter/query by created at on a Person
Similarly this isn't possible because created at is a column on the table and not a key in the Persons' properties
This issue hasn't seen activity in two years! If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.
This issue was closed due to lack of activity. Feel free to reopen if it's still relevant.
This issue is no longer relevant, because we've deprecated both distinct_id as a property, and timestamp as a property. Instead, the top-level event fields should be used, since that's the source of truth for distinct ID and timestamp. Currently, this can be done using HogQL expression.
For example, while a HogQL expression filter on the $browser event property looks like this: properties.$browser = 'foo', a filter on the distinct ID looks like this distinct_id = 'bar'.