incubator-devlake icon indicating copy to clipboard operation
incubator-devlake copied to clipboard

feat: graphql POC in github

Open likyh opened this issue 3 years ago • 2 comments

Summary

It's a poc of Graphql in Github.

however, there are some problems not finished now and we need to communicate about these:

  1. ~Compared with the restful Github plugin, it can only collect part of the data: a. collected: repo/issue/pr_commits/pr_review/account b. not collected: comment/event (no appropriate API), milestone(too easy), pr_comment(forgot but esay)~
  2. It's a serial processor, so it sends graphql query one by one.
  3. ~The url and input in raw_table are not implemented now. This question needs to be fixed in the third-party lib~.
  4. An issue request uses about 5-10 seconds and 3 rate limit points ~and An pr request uses about 5~15 seconds and 205 rate limit points~. If you collect users separately it will just cost 5+1 points.
  5. ~The collection of the repo devlake cost 3 minutes, 28 requests, and about 2000 rate limit points~.
  6. Pr commit and others like just collect 100 (first page).
  7. No retry now.

Does this close any open issues?

related #1433

likyh avatar Jul 27 '22 12:07 likyh

the comparison in different ways:

Way Restful Graphql Rough version(now) Graphql+Restful
Theoretical Request Num Num(PR) * 2 +
Num(account) * 2 +
Num(PR+issue+comment+issueEvent+reviewComment+milestone) / 100 + 1
Num(PR+issue) / 100 + 2

Not include: comment/
issueEvent/milestone/reviewComment
Num(PR+issue+account+comment+
issueEvent+milestone) / 100 + 2
Example: ants Requests: 55(PR) * 2 + 218(account) * 2 + [55(PR) + 180(issue) + 650(comment) + 1400(issueEvent) + 104(reviewComment) + 1(milestone)] / 100 + 1 ≈ 560
Time: About takes 560s for 1 token
RateLimit: 560/5000 requests
Requests: [55(PR) + 180(issue)] / 100 + 2 ≈ 5
Time: About 10s
RateLimit: About 208/5000 points
Requests: [55(PR) + 180(issue) + 218(account) + 650(comment) + 1400(issueEvent) + 1(milestone)] / 100 + 2 ≈ 30
Time: About 30s
RateLimit: 13/5000 points +
About 23/5000 requests
Example: devlake Requests: 1315(PR) * 2 + about 100(account) * 2 + [1315(PR) + 1317(issue) + 2950(comment) + 19800(issueEvent) + 1350(reviewComment) + 9(milestone)] / 100 + 1 ≈ 3100
Time: About takes 10min for 6 token
RateLimit: 3100/5000 requests
Requests: [1315(PR) + 1317(issue)] / 100 + 2 ≈ 30
Time: About 3min
RateLimit: About 2500/5000 points
Requests: [1315(PR) + 1317(issue) + about 100(account) + 2950(comment) + 19800(issueEvent) + 9(milestone)] / 100 + 2 ≈ 260
Time: About 5min for 1 token
RateLimit: 115/5000 points +
About 229/5000 requests
Example: clickhouse Requests: 27099(PR) * 2 + about 3000(account) * 2 + [27099(PR) + 12372(issue) + 40000(comment) + 40000(issueEvent) + 33200(reviewComment) + ...] / 100 + 1 ≈ 61000
Time: About takes 4h for 5 token
RateLimit: -/5000 requests
(Note comment/event can only collect top 40000 rows)
- Requests: [27099(PR) + 12372(issue) + about 3000(account) + 40000(comment) + 40000(issueEvent) + ...] / 100 + 2 ≈ 1200
Time: About 15min for 1 token
RateLimit: 1800/5000 points +
About 800/5000 requests
Example: TiDB Requests: O(25000(PR) * 2) ≈ 50000
Time: About takes 3h for 5 token
Requests: O([25000(PR) + 12000(issue) + ... + 40000(comment) + 40000(issueEvent) + ...] / 100) ≈ 1200
Time: About 15min for 1 token
RateLimit: 1800/5000 points +
About 800/5000 requests

likyh avatar Jul 28 '22 04:07 likyh

New Update:

  1. collect events by rest
  2. request users in another query to avoid exhausting too many points.

likyh avatar Aug 08 '22 15:08 likyh