incubator-devlake
incubator-devlake copied to clipboard
feat: graphql POC in github
Summary
It's a poc of Graphql in Github.
however, there are some problems not finished now and we need to communicate about these:
- ~Compared with the restful Github plugin, it can only collect part of the data: a. collected: repo/issue/pr_commits/pr_review/account b. not collected: comment/event (no appropriate API), milestone(too easy), pr_comment(forgot but esay)~
- It's a serial processor, so it sends graphql query one by one.
- ~The
urlandinputin raw_table are not implemented now. This question needs to be fixed in the third-party lib~. - An issue request uses about 5-10 seconds and 3 rate limit points ~and An pr request uses about 5~15 seconds and 205 rate limit points~. If you collect users separately it will just cost 5+1 points.
- ~The collection of the repo devlake cost 3 minutes, 28 requests, and about 2000 rate limit points~.
- Pr commit and others like just collect 100 (first page).
- No retry now.
Does this close any open issues?
related #1433
the comparison in different ways:
| Way | Restful | Graphql Rough version(now) | Graphql+Restful |
|---|---|---|---|
| Theoretical Request Num | Num(PR) * 2 + Num(account) * 2 + Num(PR+issue+comment+issueEvent+reviewComment+milestone) / 100 + 1 |
Num(PR+issue) / 100 + 2 Not include: comment/ issueEvent/milestone/reviewComment |
Num(PR+issue+account+comment+ issueEvent+milestone) / 100 + 2 |
| Example: ants | Requests: 55(PR) * 2 + 218(account) * 2 + [55(PR) + 180(issue) + 650(comment) + 1400(issueEvent) + 104(reviewComment) + 1(milestone)] / 100 + 1 ≈ 560 Time: About takes 560s for 1 token RateLimit: 560/5000 requests |
Requests: [55(PR) + 180(issue)] / 100 + 2 ≈ 5 Time: About 10s RateLimit: About 208/5000 points |
Requests: [55(PR) + 180(issue) + 218(account) + 650(comment) + 1400(issueEvent) + 1(milestone)] / 100 + 2 ≈ 30 Time: About 30s RateLimit: 13/5000 points + About 23/5000 requests |
| Example: devlake | Requests: 1315(PR) * 2 + about 100(account) * 2 + [1315(PR) + 1317(issue) + 2950(comment) + 19800(issueEvent) + 1350(reviewComment) + 9(milestone)] / 100 + 1 ≈ 3100 Time: About takes 10min for 6 token RateLimit: 3100/5000 requests |
Requests: [1315(PR) + 1317(issue)] / 100 + 2 ≈ 30 Time: About 3min RateLimit: About 2500/5000 points |
Requests: [1315(PR) + 1317(issue) + about 100(account) + 2950(comment) + 19800(issueEvent) + 9(milestone)] / 100 + 2 ≈ 260 Time: About 5min for 1 token RateLimit: 115/5000 points + About 229/5000 requests |
| Example: clickhouse | Requests: 27099(PR) * 2 + about 3000(account) * 2 + [27099(PR) + 12372(issue) + 40000(comment) + 40000(issueEvent) + 33200(reviewComment) + ...] / 100 + 1 ≈ 61000 Time: About takes 4h for 5 token RateLimit: -/5000 requests (Note comment/event can only collect top 40000 rows) |
- | Requests: [27099(PR) + 12372(issue) + about 3000(account) + 40000(comment) + 40000(issueEvent) + ...] / 100 + 2 ≈ 1200 Time: About 15min for 1 token RateLimit: 1800/5000 points + About 800/5000 requests |
| Example: TiDB | Requests: O(25000(PR) * 2) ≈ 50000 Time: About takes 3h for 5 token |
Requests: O([25000(PR) + 12000(issue) + ... + 40000(comment) + 40000(issueEvent) + ...] / 100) ≈ 1200 Time: About 15min for 1 token RateLimit: 1800/5000 points + About 800/5000 requests |
New Update:
- collect events by rest
- request users in another query to avoid exhausting too many points.