CodeAnalysis icon indicating copy to clipboard operation
CodeAnalysis copied to clipboard

client端执行任务时,偶尔会出现报错:method(send_task_result) call fails on error: 'org_sid'

Open anyuan95 opened this issue 2 years ago • 11 comments

批量通过api创建并执行了大量任务后,刚开始没有异常,执行几个小时后,每个client节点都出现了以下的报错信息,并一直在不断轮询阻塞后续任务。导致现在所有client节点都变为离线状态。

2022-04-08 02:17:44,163-ERROR-util.logutil: task_params is wrong.
Traceback (most recent call last):
  File "/docker/opt/asmallcompany/codeanalysis_client/util/tooldisplay.py", line 28, in get_tool_display_name
    tool_params = task_request['task_params']['checktool']
KeyError: 'checktool'
2022-04-08 02:17:44,170-INFO-util.logutil: Python 3.7.0
2022-04-08 02:17:44,265-INFO-util.logutil: Task_8978 (codecount) starts ...
2022-04-08 02:18:04,407-ERROR-util.wrapper: method(get_task) call fails on error: HTTP Error 504: Gateway Time-out
2022-04-08 02:18:04,407-INFO-util.wrapper: retrying method(get_task) after 5 seconds
2022-04-08 02:18:09,470-INFO-util.wrapper: method(get_task) call succeed after 1 retries
2022-04-08 02:18:19,480-INFO-util.logutil: task codecount with id 8978 is done
2022-04-08 02:18:19,480-INFO-util.logutil: uploading task(codecount) result(code:219) to server
2022-04-08 02:18:19,481-ERROR-util.wrapper: method(update_task_progress) call fails on error: 'org_sid'
2022-04-08 02:18:19,481-ERROR-util.reporter: Update_task_progress fail. upload msg:上传结果
Traceback (most recent call last):
  File "/docker/opt/asmallcompany/codeanalysis_client/util/reporter.py", line 91, in update_task_progress
    info.percent)
  File "util/wrapper.py", line 20, in util.wrapper._MethodWrapper.__call__
  File "util/wrapper.py", line 40, in util.wrapper.SyncWrapper._call_method
  File "util/wrapper.py", line 42, in util.wrapper.SyncWrapper._call_method
  File "util/wrapper.py", line 51, in util.wrapper._RetryMethod.__call__
  File "util/wrapper.py", line 98, in util.wrapper.Retry.__retry_call
  File "/docker/opt/asmallcompany/codeanalysis_client/util/api/dogserver.py", line 37, in __retry_on_error
    raise error
  File "util/wrapper.py", line 92, in util.wrapper.Retry.__retry_call
  File "/docker/opt/asmallcompany/codeanalysis_client/util/api/dogapi.py", line 68, in update_task_progress
    "jobs/%s/tasks/%s/progresses/" % (task_params["org_sid"],
KeyError: 'org_sid'
2022-04-08 02:18:19,482-ERROR-util.logutil: Fail to send result to file server! Error: project_id is empty(None), can't upload result to file server.
2022-04-08 02:18:19,482-ERROR-util.wrapper: method(send_task_result) call fails on error: 'org_sid'
2022-04-08 02:18:19,482-INFO-util.wrapper: retrying method(send_task_result) after 5 seconds
2022-04-08 02:18:24,487-ERROR-util.wrapper: method(send_task_result) call fails on error: 'org_sid'
2022-04-08 02:18:24,488-INFO-util.wrapper: retrying method(send_task_result) after 5 seconds
2022-04-08 02:18:29,493-ERROR-util.wrapper: method(send_task_result) call fails on error: 'org_sid'
2022-04-08 02:18:29,493-INFO-util.wrapper: retrying method(send_task_result) after 5 seconds

想问下,导致该问题的原因是什么?是我这边环境问题导致的吗

CodeAnalysis版本:20220402.1 client主机环境:CentOS7+python3.7.0

anyuan95 avatar Apr 08 '22 03:04 anyuan95

看起来可能是任务参数获取不完整,文件服务上传这一块有存在什么异常么?

Lingghh avatar Apr 08 '22 06:04 Lingghh

找不到当时的日志了。今晚再执行一次,明早再查一下是否能复现错误场景。

顺便咨询下,当前版本如何部署能最大可能地实现大量仓库同时执行扫描? 我通过启动单web+多server+多client+单点minio+单点redis+单点mysql的方式,批量通过api请求创建扫描任务,但是执行速度感觉并没有提升,而且请求经常会有超时。想咨询下,是因为多个server同时处理数据库和minio导致互相阻塞而导致执行速度较慢吗?还是单纯地我的数据库和minio性能低导致的?

ps.我没有选择额外的扫描工具,只是执行最基础的3种(圈复杂度+重复代码+代码统计)

anyuan95 avatar Apr 08 '22 11:04 anyuan95

此外,请教一下,我的main_log/codedog_error.log中有如下报错:

-2022-04-08 19:38:33,008-ERROR-apps.job.models.base: HTTPConnectionPool(host='127.0.0.1', port=8000): Max retries exceeded with url: /files/public_server_temp/jobdata/projects/6106/job8711/f6a2eeb6b72a11ecab030a580a24789e/task_params_20212.json (Caused by ResponseError('too many 500 error responses'))
Traceback (most recent call last):
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/apps/job/models/base.py", line 331, in task_params
    content = file_server.get_file(self.params_path)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/util/retrylib.py", line 41, in wrapper
    return func(*args, **kwargs)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/util/fileserver.py", line 100, in get_file
    rsp = self._http_client.get(file_url, headers=self._headers)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/util/httpclient.py", line 39, in get
    result = self.session("GET", path, params=params, headers=headers)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/util/httpclient.py", line 36, in session
    return self._http_client.request(method, path, fields=params, headers=headers, body=data)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/request.py", line 75, in request
    method, url, fields=fields, headers=headers, **urlopen_kw
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/request.py", line 96, in request_encode_url
    return self.urlopen(method, url, **extra_kw)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/poolmanager.py", line 376, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    **response_kw
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    **response_kw
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    **response_kw
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 866, in urlopen
    retries = retries.increment(method, url, response=response, _pool=self)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=8000): Max retries exceeded with url: /files/public_server_temp/jobdata/projects/6106/job8711/f6a2eeb6b72a11ecab030a580a24789e/task_params_20212.json (Caused by ResponseError('too many 500 error responses'))
-2022-04-08 19:49:30,692-ERROR-util.retrylib: method <get_file> exception: HTTPConnectionPool(host='127.0.0.1', port=8000): Max retries exceeded with url: /files/public_server_temp/jobdata/projects/384/job8794/83013684b72f11ecab030a580a24789e/task_params_20407.json (Caused by ResponseError('too many 500 error responses'))
Traceback (most recent call last):
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/util/retrylib.py", line 41, in wrapper
    return func(*args, **kwargs)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/util/fileserver.py", line 100, in get_file
    rsp = self._http_client.get(file_url, headers=self._headers)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/util/httpclient.py", line 39, in get
    result = self.session("GET", path, params=params, headers=headers)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/util/httpclient.py", line 36, in session
    return self._http_client.request(method, path, fields=params, headers=headers, body=data)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/request.py", line 75, in request
    method, url, fields=fields, headers=headers, **urlopen_kw
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/request.py", line 96, in request_encode_url
    return self.urlopen(method, url, **extra_kw)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/poolmanager.py", line 376, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    **response_kw
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    **response_kw
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    **response_kw
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 866, in urlopen
    retries = retries.increment(method, url, response=response, _pool=self)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=8000): Max retries exceeded with url: /files/public_server_temp/jobdata/projects/384/job8794/83013684b72f11ecab030a580a24789e/task_params_20407.json (Caused by ResponseError('too many 500 error responses'))
-2022-04-08 19:49:30,693-ERROR-apps.job.models.base: HTTPConnectionPool(host='127.0.0.1', port=8000): Max retries exceeded with url: /files/public_server_temp/jobdata/projects/384/job8794/83013684b72f11ecab030a580a24789e/task_params_20407.json (Caused by ResponseError('too many 500 error responses'))
Traceback (most recent call last):
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/apps/job/models/base.py", line 331, in task_params
    content = file_server.get_file(self.params_path)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/util/retrylib.py", line 41, in wrapper
    return func(*args, **kwargs)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/util/fileserver.py", line 100, in get_file
    rsp = self._http_client.get(file_url, headers=self._headers)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/util/httpclient.py", line 39, in get
    result = self.session("GET", path, params=params, headers=headers)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/util/httpclient.py", line 36, in session
    return self._http_client.request(method, path, fields=params, headers=headers, body=data)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/request.py", line 75, in request
    method, url, fields=fields, headers=headers, **urlopen_kw
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/request.py", line 96, in request_encode_url
    return self.urlopen(method, url, **extra_kw)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/poolmanager.py", line 376, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    **response_kw
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    **response_kw
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    **response_kw
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 866, in urlopen
    retries = retries.increment(method, url, response=response, _pool=self)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=8000): Max retries exceeded with url: /files/public_server_temp/jobdata/projects/384/job8794/83013684b72f11ecab030a580a24789e/task_params_20407.json (Caused by ResponseError('too many 500 error responses'))

但是我tail -f file_log/*.log却没有错误日志输出,请问这是为什么呢?是因为我使用了minio吗?我应该在哪里看files的日志呢?

anyuan95 avatar Apr 08 '22 11:04 anyuan95

可以看一下nginx日志确认一下报错原因哈 nginx的错误日志路径:/var/log/nginx/nginx_codedog_error.log,文件服务器路由的错误日志路径:/var/log/nginx/nginx_file_error.log 如果有调整或找不到文件,可以到/var/log/nginx/看一下

Lingghh avatar Apr 09 '22 07:04 Lingghh

找不到当时的日志了。今晚再执行一次,明早再查一下是否能复现错误场景。

顺便咨询下,当前版本如何部署能最大可能地实现大量仓库同时执行扫描? 我通过启动单web+多server+多client+单点minio+单点redis+单点mysql的方式,批量通过api请求创建扫描任务,但是执行速度感觉并没有提升,而且请求经常会有超时。想咨询下,是因为多个server同时处理数据库和minio导致互相阻塞而导致执行速度较慢吗?还是单纯地我的数据库和minio性能低导致的?

ps.我没有选择额外的扫描工具,只是执行最基础的3种(圈复杂度+重复代码+代码统计)

这里是指大量仓库扫描任务同时下发吗?预计规模多大呢?按正常运行来说,只要client足够多,就可以同时和更快完成扫描。

批量通过api请求创建扫描任务,但是执行速度感觉并没有提升 这里的执行速度是指批量启动时的接口响应速度吗?还是哪一环节的执行速度呢

Lingghh avatar Apr 09 '22 07:04 Lingghh

可以看一下nginx日志确认一下报错原因哈 nginx的错误日志路径:/var/log/nginx/nginx_codedog_error.log,文件服务器路由的错误日志路径:/var/log/nginx/nginx_file_error.log 如果有调整或找不到文件,可以到/var/log/nginx/看一下

我查了下main_log/codedog_error.log,里面一直在输出以下两种错误日志:

-2022-04-09 00:03:29,777-ERROR-apps.codeproj.apis.v3: [116]文件服务器异常
Traceback (most recent call last):
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/apps/job/models/base.py", line 174, in context
    context_url = file_server.put_file(json.dumps(context), context_path, file_server.TypeEnum.TEMPORARY)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/util/retrylib.py", line 41, in wrapper
    return func(*args, **kwargs)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/util/fileserver.py", line 90, in put_file
    rsp = self._http_client.put(file_url, data=data, headers=headers)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/util/httpclient.py", line 47, in put
    result = self.session("PUT", path, params=params, data=data, json_data=json_data, headers=headers)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/util/httpclient.py", line 36, in session
    return self._http_client.request(method, path, fields=params, headers=headers, body=data)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/request.py", line 79, in request
    method, url, fields=fields, headers=headers, **urlopen_kw
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/request.py", line 170, in request_encode_body
    return self.urlopen(method, url, **extra_kw)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/poolmanager.py", line 376, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    **response_kw
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    **response_kw
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    **response_kw
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 866, in urlopen
    retries = retries.increment(method, url, response=response, _pool=self)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=8000): Max retries exceeded with url: /files/public_server_temp/jobdata/projects/1503/job9500/0b770cfab75311ec88030a580a15f4c8/job_context.json (Caused by ResponseError('too many
 500 error responses'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/apps/codeproj/apis/v3.py", line 971, in post
    project, creator=UserManager.get_username(request.user), scan_data=slz.validated_data)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/apps/codeproj/core/base.py", line 125, in create_server_scan
    job.context = job_context
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/apps/job/models/base.py", line 178, in context
    raise CDErrorBase(errcode.E_SERVER_FILE_SERVICE_ERROR, "文件服务器异常")
util.exceptions.CDErrorBase: [116]文件服务器异常
-2022-04-09 00:08:53,199-ERROR-apps.codeproj.core.projmgr: create repo exception: (1062, "Duplicate entry 'http://git.asmallcompanyoa.com/wmbsc/wm-my-award-ORG_1_TEAM_1' for key 'codeproj_baserepository_scm_url_url_key_57a3deb9_uniq'")
Traceback (most recent call last):
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/pymysql/cursors.py", line 148, in execute
    result = self._query(query)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/pymysql/cursors.py", line 310, in _query
    conn.query(q)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/pymysql/connections.py", line 548, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/pymysql/connections.py", line 775, in _read_query_result
    result.read()
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/pymysql/connections.py", line 1156, in read
    first_packet = self.connection._read_packet()
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/pymysql/connections.py", line 725, in _read_packet
    packet.raise_for_error()
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/pymysql/protocol.py", line 221, in raise_for_error
    err.raise_mysql_exception(self._data)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.IntegrityError: (1062, "Duplicate entry 'http://git.asmallcompanyoa.com/wmbsc/wm-my-award-ORG_1_TEAM_1' for key 'codeproj_baserepository_scm_url_url_key_57a3deb9_uniq'")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/apps/codeproj/core/projmgr.py", line 131, in v3_create_repo
    scm_type=scm_type, scm_url=scm_url, user=user, url_key=url_key, project_team=pt)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/apps/base/basemodel.py", line 151, in create
    return super(MTQuerySet, self).create(*args, **kwargs)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/models/query.py", line 447, in create
    obj.save(force_insert=True, using=self.db)
  File "/docker/opt/asmallcompany/codeanalysis_server/projects/main/apps/base/basemodel.py", line 263, in save
    return super(CDBaseModel, self).save(*args, **kwargs)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/models/base.py", line 754, in save
    force_update=force_update, update_fields=update_fields)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/models/base.py", line 792, in save_base
    force_update, using, update_fields,
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/models/base.py", line 895, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/models/base.py", line 935, in _do_insert
    using=using, raw=raw,
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/models/query.py", line 1254, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1397, in execute_sql
    cursor.execute(sql, params)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 98, in execute
    return super().execute(sql, params)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/pymysql/cursors.py", line 148, in execute
    result = self._query(query)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/pymysql/cursors.py", line 310, in _query
    conn.query(q)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/pymysql/connections.py", line 548, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/pymysql/connections.py", line 775, in _read_query_result
    result.read()
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/pymysql/connections.py", line 1156, in read
    first_packet = self.connection._read_packet()
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/pymysql/connections.py", line 725, in _read_packet
    packet.raise_for_error()
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/pymysql/protocol.py", line 221, in raise_for_error
    err.raise_mysql_exception(self._data)
  File "/home/asmallcompany/.local/lib/python3.7/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
    raise errorclass(errno, errval)
django.db.utils.IntegrityError: (1062, "Duplicate entry 'http://git.asmallcompanyoa.com/wmbsc/wm-my-award-ORG_1_TEAM_1' for key 'codeproj_baserepository_scm_url_url_key_57a3deb9_uniq'")

按照您说的,查找nginx日志文件发现,今天只有nginx_codedog_error.log文件中一直在输出以下异常信息:

==> nginx_codedog_error.log <==
2022/04/09 21:05:30 [error] 1144#0: *286929 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 127.0.0.1, server: 0.0.0.0, request: "GET /files/public_server_temp/jobdata/projects/1670/job15729/e8be3c04b7ea11ec83330a580a307db7/task_params_28536.json HTTP/1.1", subrequest: "/urlauth/", upstream: "http://127.0.0.1:8001/api/authen/urlauth/", host: "127.0.0.1:8000"
2022/04/09 21:05:30 [error] 1144#0: *286929 auth request unexpected status: 504 while sending to client, client: 127.0.0.1, server: 0.0.0.0, request: "GET /files/public_server_temp/jobdata/projects/1670/job15729/e8be3c04b7ea11ec83330a580a307db7/task_params_28536.json HTTP/1.1", host: "127.0.0.1:8000"

日志中发现出现很多504异常,大多都是请求/files的时候auth request认证超时。该超时是否是因为main服务压力过大导致的?

此外,我通过api或页面访问main/*接口时也频繁出现超时情况,即使直接通过ip:8001/main/*访问也会出现超时情况。 还有一个很奇怪的情况是,我直接通过api(http://10.30.138.196:8001/api/v2/nodes/)访问main服务,请求就卡住了(几十秒没有响应),但是把请求立即取消,重新发一次,就可以直接返回响应了。不太理解是什么导致的这种情况?

anyuan95 avatar Apr 09 '22 13:04 anyuan95

看起来可能是任务参数获取不完整,文件服务上传这一块有存在什么异常么?

这个问题应该解决了,问题应该是由于磁盘空间不足导致的。 检查了下监控发现,当时申请的client主机磁盘只有20G,中间出现了好几次df.bytes.free.percent的告警,我将磁盘调整成50G后,昨晚执行期间没有再出现该异常。

anyuan95 avatar Apr 09 '22 13:04 anyuan95

日志中发现出现很多504异常,大多都是请求/files的时候auth request认证超时。该超时是否是因为main服务压力过大导致的?


有可能是这个原因

Lingghh avatar Apr 10 '22 13:04 Lingghh

此外,我通过api或页面访问main/*接口时也频繁出现超时情况,即使直接通过ip:8001/main/*访问也会出现超时情况。


请问下,通过api或页面访问时有正在执行扫描吗?main_log/codedog_error.log日志有错误信息吗?

能否发一下您这边部署机器的配置信息和并发数量哈,我这边复现一下看看

Lingghh avatar Apr 10 '22 13:04 Lingghh

此外,我通过api或页面访问main/*接口时也频繁出现超时情况,即使直接通过ip:8001/main/*访问也会出现超时情况。

请问下,通过api或页面访问时有正在执行扫描吗?main_log/codedog_error.log日志有错误信息吗?

能否发一下您这边部署机器的配置信息和并发数量哈,我这边复现一下看看

'通过api或页面访问时有正在执行扫描吗?' -- 是的,出现超时情况时都是正在执行扫描

'main_log/codedog_error.log日志有错误信息吗?' -- 有的,我用grep ERROR codedog_error.log统计了下,只有以下的两种错误日志

-2022-04-10 10:03:04,685-ERROR-apps.codeproj.core.projmgr: create repo exception: (1062, "Duplicate entry 'http://git.asmallcompany.com/xxx/xxxxx-ORG_1_TEA' for key 'codeproj_baserepository_scm_url_url_key_57a3deb9_uniq'")
-2022-04-10 10:03:38,785-ERROR-util.retrylib: method <put_file> exception: HTTPConnectionPool(host='127.0.0.1', port=8000): Max retries exceeded with url: /files/public_server_temp/jobdata/projects/896/job20458/11388984b87011ec90090a580a170f4e/job_context.json (Caused by ResponseError('too many 500 error responses'))

并发数量:我使用Java(OkHttpClient),使用单个线程,对约13000个git仓库进行遍历并调用如下代码(所有仓库都建立在了同一个org的同一个team下,并使用同一个扫描方案模板

public Long doScan(String orgSid, String teamName, String gitUrl, String branch) {
    Long repoId = checkGitRepoExists(orgSid, teamName, gitUrl);
    if (null == repoId) {
        repoId = createGitRepo(orgSid, teamName, gitUrl);
    }
    Long projectId = checkBranchProject(orgSid, teamName, repoId, branch);
    if (null == projectId) {
        projectId = createBranchProject(orgSid, teamName, repoId, branch);
    }
    return createScan(orgSid, teamName, repoId, projectId, true);
}

部署机器的配置信息:

  • server: CentOS7.6.1810/8核/16G/100G * 5
  • client: CentOS7.6.1810/4核/8G/50G * 20

anyuan95 avatar Apr 11 '22 03:04 anyuan95

好的,感谢反馈,我们这边分析看看

Lingghh avatar Apr 11 '22 04:04 Lingghh