github-explorer icon indicating copy to clipboard operation
github-explorer copied to clipboard

Store actor.id and repo.id for events

Open tisonkun opened this issue 3 years ago • 5 comments

Hi @alexey-milovidov,

Is it possible we store actor.id and repo.id as documented on GitHub Event docs?

So far, we store actor.login and repo.name only, which is convenient to use but fails to handle repo/user rename case.

The process can be we extend those column first and fill with data identified by event id.

tisonkun avatar Jul 25 '22 06:07 tisonkun

Yes, I see that source data has these fields, so it's possible to recalculate the dataset and include these fields.

alexey-milovidov avatar Aug 08 '22 03:08 alexey-milovidov

@alexey-milovidov I use Aliyun's ClickHouse service to implement this feature for my usage. Finally, I add actor.id, repo.id and id (event id). Also, the insertion can fail and in order to make the process idempotent, I created the table with the ReplacingMergeTree engine.

My create table statement is:

CREATE TABLE default.github_events
(
    `file_time` DateTime,
    `event_id` UInt64,
    `actor_id` UInt64,
    `repo_id` UInt64,
    `event_type` Enum8('CommitCommentEvent' = 1, 'CreateEvent' = 2, 'DeleteEvent' = 3, 'ForkEvent' = 4, 'GollumEvent' = 5, 'IssueCommentEvent' = 6, 'IssuesEvent' = 7, 'MemberEvent' = 8, 'PublicEvent' = 9, 'PullRequestEvent' = 10, 'PullRequestReviewCommentEvent' = 11, 'PushEvent' = 12, 'ReleaseEvent' = 13, 'SponsorshipEvent' = 14, 'WatchEvent' = 15, 'GistEvent' = 16, 'FollowEvent' = 17, 'DownloadEvent' = 18, 'PullRequestReviewEvent' = 19, 'ForkApplyEvent' = 20, 'Event' = 21, 'TeamAddEvent' = 22),
    `actor_login` LowCardinality(String),
    `repo_name` LowCardinality(String),
    `created_at` DateTime,
    `updated_at` DateTime,
    `action` Enum8('none' = 0, 'created' = 1, 'added' = 2, 'edited' = 3, 'deleted' = 4, 'opened' = 5, 'closed' = 6, 'reopened' = 7, 'assigned' = 8, 'unassigned' = 9, 'labeled' = 10, 'unlabeled' = 11, 'review_requested' = 12, 'review_request_removed' = 13, 'synchronize' = 14, 'started' = 15, 'published' = 16, 'update' = 17, 'create' = 18, 'fork' = 19, 'merged' = 20),
    `comment_id` UInt64,
    `body` String,
    `path` String,
    `position` Int32,
    `line` Int32,
    `ref` LowCardinality(String),
    `ref_type` Enum8('none' = 0, 'branch' = 1, 'tag' = 2, 'repository' = 3, 'unknown' = 4),
    `creator_user_login` LowCardinality(String),
    `number` UInt32,
    `title` String,
    `labels` Array(LowCardinality(String)),
    `state` Enum8('none' = 0, 'open' = 1, 'closed' = 2),
    `locked` UInt8,
    `assignee` LowCardinality(String),
    `assignees` Array(LowCardinality(String)),
    `comments` UInt32,
    `author_association` Enum8('NONE' = 0, 'CONTRIBUTOR' = 1, 'OWNER' = 2, 'COLLABORATOR' = 3, 'MEMBER' = 4, 'MANNEQUIN' = 5),
    `closed_at` DateTime,
    `merged_at` DateTime,
    `merge_commit_sha` String,
    `requested_reviewers` Array(LowCardinality(String)),
    `requested_teams` Array(LowCardinality(String)),
    `head_ref` LowCardinality(String),
    `head_sha` String,
    `base_ref` LowCardinality(String),
    `base_sha` String,
    `merged` UInt8,
    `mergeable` UInt8,
    `rebaseable` UInt8,
    `mergeable_state` Enum8('unknown' = 0, 'dirty' = 1, 'clean' = 2, 'unstable' = 3, 'draft' = 4),
    `merged_by` LowCardinality(String),
    `review_comments` UInt32,
    `maintainer_can_modify` UInt8,
    `commits` UInt32,
    `additions` UInt32,
    `deletions` UInt32,
    `changed_files` UInt32,
    `diff_hunk` String,
    `original_position` UInt32,
    `commit_id` String,
    `original_commit_id` String,
    `push_size` UInt32,
    `push_distinct_size` UInt32,
    `member_login` LowCardinality(String),
    `release_tag_name` String,
    `release_name` String,
    `review_state` Enum8('none' = 0, 'approved' = 1, 'changes_requested' = 2, 'commented' = 3, 'dismissed' = 4, 'pending' = 5)
)
ENGINE = ReplacingMergeTree
ORDER BY (event_type, repo_name, created_at, event_id)
SETTINGS index_granularity = 8192

BTW, I ever tried to find a ClickHouse.com provided service but it seems not yet available to developers :P

tisonkun avatar Aug 08 '22 03:08 tisonkun

@tisonkun Thank you! ClickHouse Cloud is currently available for early access by selected customers. It will be available in public beta in October.

alexey-milovidov avatar Aug 08 '22 05:08 alexey-milovidov

It will be available in public beta in October.

I can't wait to give it a try 🤣 Other vendors always have a version lag to the upstream and it's painful to workaround features already provided in the nightly version.

tisonkun avatar Aug 16 '22 04:08 tisonkun

@alexey-milovidov FYI I shared the related code on https://github.com/korandoru/github-adventurer and wrote a Chinese blog in https://www.tisonkun.org/2022/08/16/github-adventurer-with-clickhouse/.

I submitted an early access request with the email address [email protected]. If only ClickHouse.com provides support for the latest version, I'm willing to migrate to it and write down my experience >_<

tisonkun avatar Aug 18 '22 13:08 tisonkun