incubator-devlake icon indicating copy to clipboard operation
incubator-devlake copied to clipboard

Allow scraping metrics to be throttled via UI

Open benjessop12 opened this issue 3 years ago • 11 comments

Description

👋 Whilst attempting to scrape metrics from a large Gitlab project (hundreds of thousands of pipelines, tens of thousands of merge requests) the triggered pipeline that scrapes the metrics via the gitlab api was causing increased load to the point users were noticing decreased performance via the UI.

The logs were showing ~19-20 of these calls per second: GET https://<gitlab_endpoint>/api/v4/projects/<project_id>/merge_requests/<merged_request_id>/notes?system=false&per_page=1&page=0

Proposed solution note: I did check for anything existing that related to throttling but couldn't find it, even in the Gitlab documentation

It would be nice to be able to rate limit the number of api calls made per xx time, or to set a float value to "wait" between api calls in effort to reduce the load. If this could be configured via the config-ui that would be fantastic.

Has the Feature been Requested Before? I couldn't see any via searching similar keywords. Feel free too close if this request is a duplicate.

Describe alternatives you've considered An alternative feature would be to configure the amount of pipelines, merge requests, etc (writing this is scoped to Gitlab at the moment, but can be applied to alternative integrated services) that are queried. For example, if I could configure the last 10_000 merge requests and 25_000 pipelines to be scraped for querying, that would be beneficial in the sense it would reduce the amount of time the scraping would run for as well as provide more recent data for querying.

benjessop12 avatar Dec 02 '21 22:12 benjessop12

This is a great request. We should do this feature! Do you want to make a pull request? If not, I can get it into our company pipeline in the coming weeks!

joncodo avatar Dec 03 '21 14:12 joncodo

This issue mentioned two methods to implement rate limit:

  1. Directly limit num of api calls per xx time(e.g. one minute), in this way, we don't need to change many codes, only need to pass one more param from config-ui, then use this param to calculate the interval can be used by time.sleep to limit. Or just send a param to indicate the interval between two api calls.
  2. Set a number of the most recent merge requests to be queried: I checked both gitlab and github api, they don't have params to limit number, but they have updated_after/created_after(gitlab) and since(github) to limit the number of entries.

warren830 avatar Jan 18 '22 14:01 warren830

below contains api link and params description

  1. github issue api link: https://docs.github.com/en/github-ae@latest/rest/reference/commits
  • since: Only show notifications updated after the given time. This is a timestamp in ISO 8601 format: YYYY-MM-DDTHH:MM:SSZ
  1. gitlab merge request api link: https://docs.gitlab.com/ee/api/merge_requests.html
  • created_after: Return merge requests created on or after the given time. Expected in ISO 8601 format (2019-03-15T08:00:00Z)
  • updated_after: Return merge requests updated on or after the given time. Expected in ISO 8601 format (2019-03-15T08:00:00Z)

warren830 avatar Jan 18 '22 14:01 warren830

@yumengwang03 Please take a look at this, this setting is named API_REQUESTS_PER_HOUR in Backend as a Global Default Setting. And we can support setting up a higher priority value on Connection level. Please add this into your connection page/dialog. And we should also consider to create a setting panel for Global Settings. Including:

API_TIMEOUT=10s
API_RETRY=3
API_REQUESTS_PER_HOUR=10000
# Debug Info Warn Error
LOGGING_LEVEL=
DB_LOGGING_LEVEL=

klesh avatar May 12 '22 06:05 klesh

Plan of attack: Add a setting to the connection page @e2corporation

klesh avatar Jun 13 '22 09:06 klesh

There should be design first.

yumengwang03 avatar Jun 20 '22 14:06 yumengwang03

@yumengwang03 FYI, we figured that the perfect place for this setting to sit is connection.

klesh avatar Jun 20 '22 14:06 klesh

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar Aug 02 '22 00:08 github-actions[bot]

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

github-actions[bot] avatar Aug 09 '22 00:08 github-actions[bot]

hi @e2corporation , Can we add this setting to the connection editing page in this iteration?

klesh avatar Aug 09 '22 01:08 klesh

API swagger issue: https://github.com/apache/incubator-devlake/issues/2449 @e2corporation Can you add a numeric input for rateLimitPerHour in connection edit page?

warren830 avatar Aug 09 '22 03:08 warren830