verified-sources
verified-sources copied to clipboard
Slack `ts` and `thread_ts` inconsistent types
dlt version
0.4.7
Source name
slack
Describe the problem
Values by get_messages()
and get_thread_replies()
don't return the same data types for field ts
and thread_ts
. Values are returned as timestamp
for the first and string
for the latter.
This is problematic when trying to join tables of messages and replies based on their thread_ts
(thread id), which is a very common operation.
This is because get_messages()
passes datetime_fields=MSG_DATETIME_FIELDS
whereas get_thread_replies()
doesn't.
Expected behavior
-
ts
andthread_ts
should both receive the same type fromMSG_DATETIME_FIELDS
-
More importantly, according to Slack specs,
ts
andthread_ts
are not timestamps andstring
is actually the proper type. (see ref)
There are a few additional fields that describe the author (such as user or bot_id), but there's also an additional ts field. The ts value is essentially the ID of the message, guaranteed unique within the context of a channel or conversation.
They look like UNIX/epoch timestamps, hence ts, with specified milliseconds. But they're actually message IDs, even if they're partially composed in seconds-since-the-epoch.
Given ts
and thread_ts
do not exactly represent a timestamp but rather are unique ids that can be sorted chronologically, I just removing them from the default values of MSG_DATETIME_FIELDS
.
This would be a breaking change for the message
tables, but not for replies
tables, so it would the right time to introduce the change to defaults if accepted.
Steps to reproduce
dlt init slack
How you are using the source?
I run this source for fun.
Operating system
Linux
Runtime environment
Local
Python version
3.10.9
dlt destination
duckdb
Additional information
As a solution, I manually change type of ts
and thread_ts
of messages from timestamp
to string