Kamu EventTime column can not have nulls exception
I am facting the issue with EventTime column, even though I have followed the date format as mentioned in the yaml file. please help
yaml file start
kind: DatasetSnapshot
version: 1
content:
name: Hiding Name
kind: root
metadata:
- kind: setPollingSource
fetch:
kind: url
url: Hiding URL
read:
kind: csv
separator: ","
header: true
nullValue: ""
preprocess:
kind: sql
engine: spark
query: >
SELECT CAST(id as BIGINT) as id,
CAST(UNIX_TIMESTAMP(date, "yyyy-MM-dd") as TIMESTAMP) as date,
username as username,name as name, tweet as tweet, language as language, mentions as mentions,
urls as urls, photos as photos, replies_count as replies_count,
retweets_count as retweets_count,
likes_count as likes,hashtags as hashtags,link as link, retweet as retweet,
quote_url as quote_url,
video as video, thumbnail as thumbnail, reply_to as reply_to
FROM input
merge:
kind: ledger
primaryKey:
- id
- kind: setVocab
eventTimeColumn: date
yaml file end
Hi @suresh852456, thanks for providing all the information and a data sample.
I reformated the YAML as follows:
kind: DatasetSnapshot
version: 1
content:
name: test
kind: root
metadata:
- kind: setPollingSource
fetch:
kind: url
url: file:///home/smikhtoniuk/Downloads/2/sample.csv
read:
kind: csv
separator: ","
header: true
nullValue: ""
preprocess:
kind: sql
engine: spark
query: >
SELECT
CAST(id as BIGINT) as id,
CAST(UNIX_TIMESTAMP(date, "yyyy-MM-dd") as TIMESTAMP) as date,
username as username,
name as name,
tweet as tweet,
language as language,
mentions as mentions,
urls as urls,
photos as photos,
replies_count as replies_count,
retweets_count as retweets_count,
likes_count as likes,
hashtags as hashtags,
link as link,
retweet as retweet,
quote_url as quote_url,
video as video,
thumbnail as thumbnail,
reply_to as reply_to
FROM input
merge:
kind: ledger
primaryKey:
- id
- kind: setVocab
eventTimeColumn: date
Ran:
kamu init
kamu add test.yaml
kamu pull test
And everything worked just fine using the sample data you provided:
$ kamu tail test -n 3
┌────────┬─────────────────────────┬─────────────────────┬────┬────────────┬────────────┬──────────────┬──────────┬──────────┬──────┬────────┬───────────────┬────────────────┬───────┬──────────┬────────────────────────────┬─────────┬───────────┬───────┬───────────┬─────────────────────────────────────────────────────────────────┐
│ offset │ system_time │ date │ id │ username │ name │ tweet │ language │ mentions │ urls │ photos │ replies_count │ retweets_count │ likes │ hashtags │ link │ retweet │ quote_url │ video │ thumbnail │ reply_to │
├────────┼─────────────────────────┼─────────────────────┼────┼────────────┼────────────┼──────────────┼──────────┼──────────┼──────┼────────┼───────────────┼────────────────┼───────┼──────────┼────────────────────────────┼─────────┼───────────┼───────┼───────────┼─────────────────────────────────────────────────────────────────┤
│ 99 │ 2022-07-29 19:32:25.404 │ 2021-07-09 00:00:00 │ │ sampleuser │ samplename │ sample tweet │ en │ [] │ [] │ [] │ 1 │ 1 │ 0 │ [] │ https://www.samplelink.com │ FALSE │ │ 0 │ │ [{'screen_name': 'sample', 'name': 'smpl', 'id': '2320301582'}] │
│ 98 │ 2022-07-29 19:32:25.404 │ 2021-06-28 00:00:00 │ │ sampleuser │ samplename │ sample tweet │ en │ [] │ [] │ [] │ 2 │ 0 │ 0 │ [] │ https://www.samplelink.com │ FALSE │ │ 0 │ │ [{'screen_name': 'sample', 'name': 'smpl', 'id': '2320301582'}] │
│ 97 │ 2022-07-29 19:32:25.404 │ 2021-06-24 00:00:00 │ │ sampleuser │ samplename │ sample tweet │ en │ [] │ [] │ [] │ 1 │ 0 │ 1 │ [] │ https://www.samplelink.com │ FALSE │ │ 0 │ │ [{'screen_name': 'sample', 'name': 'smpl', 'id': '2320301582'}] │
└────────┴─────────────────────────┴─────────────────────┴────┴────────────┴────────────┴──────────────┴──────────┴──────────┴──────┴────────┴───────────────┴────────────────┴───────┴──────────┴────────────────────────────┴─────────┴───────────┴───────┴───────────┴─────────────────────────────────────────────────────────────────┘
I'm using latest version:
$ kamu --version
kamu 0.96.0
...but I don't think your problem is version-related.
Could it be that when you redacted your data to create a sample this made the problem go away?
In your screenshot where the error shows sample row: you can see:
- first two values are
offsetandsystem_timecolumns that are added automatically bykamu - next two columns are supposed to be your
idanddate, but BOTH of them are NULL - judging by the
offset == 0- this should be the very first record in your input CSV which has this issue.
To debug this:
- Please try ingesting your own sample data and see if you get the issue. If yes - try updating to latest
kamuversion. - If no - try ingesting a CSV with just the first row (otherwise unredacted) and see if you can isolate the problem.
Let me know what you find!