kafka-tutorials
                                
                                 kafka-tutorials copied to clipboard
                                
                                    kafka-tutorials copied to clipboard
                            
                            
                            
                        TOMBSTONE Messages : (ksqlDB Tutorial) How to find distinct values in a stream of events
For the tutorial below https://kafka-tutorials.confluent.io/finding-distinct-events/ksql.html
If we don't make use of "TIMESTAMP", I see TOMBSTONE Messages. What is the reason for this?
The CLICKS stream is modified to not include the TIMESTAMPA table is created as below
CREATE STREAM CLICKS (IP_ADDRESS STRING, URL STRING)
    WITH (KAFKA_TOPIC = 'CLICKS',
          FORMAT = 'JSON',
          PARTITIONS = 1);
We then insert the values
INSERT INTO CLICKS (IP_ADDRESS, URL) VALUES ('10.0.0.1', 'https://docs.confluent.io/current/tutorials/examples/kubernetes/gke-base/docs/index.html');
INSERT INTO CLICKS (IP_ADDRESS, URL) VALUES ('10.0.0.12', 'https://www.confluent.io/hub/confluentinc/kafka-connect-datagen');
INSERT INTO CLICKS (IP_ADDRESS, URL) VALUES ('10.0.0.13', 'https://www.confluent.io/hub/confluentinc/kafka-connect-datagen');
INSERT INTO CLICKS (IP_ADDRESS, URL) VALUES ('10.0.0.1', 'https://docs.confluent.io/current/tutorials/examples/kubernetes/gke-base/docs/index.html');
INSERT INTO CLICKS (IP_ADDRESS, URL) VALUES ('10.0.0.12', 'https://www.confluent.io/hub/confluentinc/kafka-connect-datagen');
INSERT INTO CLICKS (IP_ADDRESS, URL) VALUES ('10.0.0.13', 'https://www.confluent.io/hub/confluentinc/kafka-connect-datagen');
We can then query the STREAM to see duplicate clicks
ksql> SELECT
   IP_ADDRESS,
   URL
FROM CLICKS 
GROUP BY IP_ADDRESS, URL
HAVING COUNT(IP_ADDRESS) = 1
EMIT CHANGES;
+----------------------------------------------------------------+----------------------------------------------------------------+
|IP_ADDRESS                                                      |URL                                                             |
+----------------------------------------------------------------+----------------------------------------------------------------+
|10.0.0.1                                                        |https://docs.confluent.io/current/tutorials/examples/kubernetes/|
|                                                                |gke-base/docs/index.html                                        |
|10.0.0.12                                                       |https://www.confluent.io/hub/confluentinc/kafka-connect-datagen |
|10.0.0.13                                                       |https://www.confluent.io/hub/confluentinc/kafka-connect-datagen |
|10.0.0.1                                                        |https://docs.confluent.io/current/tutorials/examples/kubernetes/|
|                                                                |gke-base/docs/index.html                                        |
|10.0.0.12                                                       |https://www.confluent.io/hub/confluentinc/kafka-connect-datagen |
|10.0.0.13                                                       |https://www.confluent.io/hub/confluentinc/kafka-connect-datagen |
We can also create the table as per the lesson and query on the table to find the TOMBSTONE messages.
CREATE TABLE DETECTED_CLICKS AS
    SELECT
        IP_ADDRESS AS KEY1,
        URL AS KEY2,
        AS_VALUE(IP_ADDRESS) AS IP_ADDRESS,
        AS_VALUE(URL) AS URL
    FROM CLICKS 
    GROUP BY IP_ADDRESS, URL
    HAVING COUNT(IP_ADDRESS) = 1;
SELECT * FROM DETECTED_CLICKS EMIT CHANGES;
+-------------------------------+-------------------------------+-------------------------------+-------------------------------+
|KEY1                           |KEY2                           |IP_ADDRESS                     |URL                            |
+-------------------------------+-------------------------------+-------------------------------+-------------------------------+
|10.0.0.1                       |https://docs.confluent.io/curre|10.0.0.1                       |https://docs.confluent.io/curre|
|                               |nt/tutorials/examples/kubernete|                               |nt/tutorials/examples/kubernete|
|                               |s/gke-base/docs/index.html     |                               |s/gke-base/docs/index.html     |
|10.0.0.12                      |https://www.confluent.io/hub/co|10.0.0.12                      |https://www.confluent.io/hub/co|
|                               |nfluentinc/kafka-connect-datage|                               |nfluentinc/kafka-connect-datage|
|                               |n                              |                               |n                              |
|10.0.0.13                      |https://www.confluent.io/hub/co|10.0.0.13                      |https://www.confluent.io/hub/co|
|                               |nfluentinc/kafka-connect-datage|                               |nfluentinc/kafka-connect-datage|
|                               |n                              |                               |n                              |
|10.0.0.1                       |https://docs.confluent.io/curre|<TOMBSTONE>                    |<TOMBSTONE>                    |
|                               |nt/tutorials/examples/kubernete|                               |                               |
|                               |s/gke-base/docs/index.html     |                               |                               |
|10.0.0.12                      |https://www.confluent.io/hub/co|<TOMBSTONE>                    |<TOMBSTONE>                    |
|                               |nfluentinc/kafka-connect-datage|                               |                               |
|                               |n                              |                               |                               |
|10.0.0.13                      |https://www.confluent.io/hub/co|<TOMBSTONE>                    |<TOMBSTONE>                    |
|                               |nfluentinc/kafka-connect-datage|                               |                               |
|                               |n                              |                               |                               |
What is the magic behind the TIMESTAMP field that leads to a total different behavior that we see in the lesson explanation?
Much appreciated.