grimoirelab icon indicating copy to clipboard operation
grimoirelab copied to clipboard

studies on git index not working

Open marcofranssen opened this issue 4 years ago • 10 comments

I have following setup as explained over here.

https://github.com/chaoss/grimoirelab/blob/master/default-grimoirelab-settings/setup.cfg

[git]
# Names for raw and enriched indexes
raw_index = git_grimoirelab-raw
enriched_index = git_grimoirelab
latest-items = true
studies = [enrich_demography:git, enrich_areas_of_code:git, enrich_onion:git]

[enrich_demography:git]

[enrich_areas_of_code:git]
in_index = git_grimoirelab-raw
out_index = git_aoc_grimoirelab_enriched

[enrich_onion:git]
in_index = git_grimoirelab
out_index = git_onion_grimoirelab_enriched
contribs_field = hash

Unfortunately I don't see the out_index being created for both the onion and areas of code. Also demography is not visible in Kibiter dashboards.

I'm running the following docker images.

  • bitergia/mordred:grimoirelab-0.5.52
  • bitergia/kibiter:community-v6.8.6-3
  • docker.elastic.co/elasticsearch/elasticsearch-oss:6.8.6

marcofranssen avatar Jan 15 '21 12:01 marcofranssen

Hi @marcofranssen, thanks for showing your interest in GrimoireLab. I see a small mistake in the configs. I think it might be one of the reasons for the issue.

  [enrich_onion:git]
- in_index = git_grimoirelab
+ in_index = git
  out_index = git_onion_grimoirelab_enriched
  contribs_field = hash

Let us know if this solves the issue. :slightly_smiling_face:

vchrombie avatar Jan 18 '21 12:01 vchrombie

@vchrombie I don't get it. In the examples shown on the link that I posted in previous comment in_index of [enrich_union:git] points to the out_index of the [git] section.

I'll try to change it and see if that at least populates the onion part. But I don't see the logic in the examples on how to resolve this for the enrich areas of code section.

marcofranssen avatar Jan 18 '21 13:01 marcofranssen

Hi @marcofranssen

@vchrombie I don't get it. In the examples shown on the link that I posted in previous comment in_index of [enrich_union:git] points to the out_index of the [git] section.

Sorry but I checked it too, link to the line -> https://github.com/chaoss/grimoirelab/blob/master/default-grimoirelab-settings/setup.cfg#L152

The in_index of [enrich_onion:git] points to git. This index name comes from the aliases.json file.

I'll try to change it and see if that at least populates the onion part. But I don't see the logic in the examples on how to resolve this for the enrich areas of code section.

The areas of code configuration looks fine to me. I don't understand the reason why the out_index is not generated. I will have a closer look. Thanks.

vchrombie avatar Jan 18 '21 13:01 vchrombie

@vchrombie yup I also just realized, git is the alias for my index git_grimoirelab. Both studies simply don't seem to be executed as there are no indexes created.

marcofranssen avatar Jan 18 '21 14:01 marcofranssen

I am having a similar issue. Trying to construct an setup.cfg from the example is extremely difficult and the documentation snippets in the https://github.com/chaoss/grimoirelab-sirmordred#setupcfg do not generate a working configuration either. I have declared github:pull, git, and github:repo as described in the extended documentation to setup.cfg and all my charts are empty. Even data sources appears empty, but ES ran and I watched the debug logs iterate PRs, etc.

RBI-AaronKulick avatar Jan 21 '21 22:01 RBI-AaronKulick

Hi @marcofranssen,

I guess your git raw and enriched indexes generated well.

Could you check your all.log if the studies have been run or if you see any errors? Take into account that the studies will start when the enrichment process is finished.

I hope it helps you.

Best, Quan

zhquan avatar Jan 22 '21 14:01 zhquan

Hi @RBI-AaronKulick,

Trying to construct an setup.cfg from the example is extremely difficult and the documentation snippets in the https://github.com/chaoss/grimoirelab-sirmordred#setupcfg do not generate a working configuration either.

Sorry, we will improve the DOC.

I have declared github:pull, git, and github:repo as described in the extended documentation to setup.cfg and all my charts are empty. Even data sources appears empty, but ES ran and I watched the debug logs iterate PRs, etc.

Check your all.log if you see any errors and could you share your setup.cfg and the projects.json?

Best, Quan

zhquan avatar Jan 22 '21 14:01 zhquan

@zhquan I noticed mordred was now in following loop for as far I could scroll back in the logs.

2021-01-26 10:04:27,722 - sirmordred.task_manager - INFO - [Global tasks] sleeping for 100 seconds
2021-01-26 10:06:08,810 - sirmordred.task_projects - INFO - Reading projects data from  /home/bitergia/conf/projects.json
2021-01-26 10:06:09,831 - sirmordred.task_identities - INFO - [sortinghat] No changes in file /home/bitergia/conf/organizations.json, organizations won't be loaded
2021-01-26 10:06:09,831 - sirmordred.task_identities - INFO - Loading GrimoireLab identities in SortingHat
2021-01-26 10:06:10,047 - sirmordred.task_identities - INFO - [sortinghat] No changes in file /tmp/tmpu2r64qpf, identities won't be loaded
2021-01-26 10:06:10,048 - sirmordred.task_identities - INFO - [sortinghat] End of loading identities from file /tmp/tmpu2r64qpf
2021-01-26 10:06:28,367 - sirmordred.task_identities - INFO - [sortinghat] Unifying identities using algorithm email-name
2021-01-26 10:06:29,432 - sirmordred.task_identities - INFO - [sortinghat] Unifying identities using algorithm email
2021-01-26 10:06:30,483 - sirmordred.task_identities - INFO - [sortinghat] Unifying identities using algorithm github
2021-01-26 10:06:31,359 - sirmordred.task_identities - INFO - [sortinghat] Executing affiliate
2021-01-26 10:06:45,168 - sirmordred.task_identities - INFO - [sortinghat] Executing autoprofile for sources: ['git', 'github']
2021-01-26 10:06:57,050 - sirmordred.task_identities - INFO - [sortinghat] Autogender not configured. Skipping.
2021-01-26 10:06:57,050 - sirmordred.task_manager - INFO - [Global tasks] sleeping for 100 seconds
2021-01-26 10:08:38,132 - sirmordred.task_projects - INFO - Reading projects data from  /home/bitergia/conf/projects.json
2021-01-26 10:08:39,149 - sirmordred.task_identities - INFO - [sortinghat] No changes in file /home/bitergia/conf/organizations.json, organizations won't be loaded
2021-01-26 10:08:39,149 - sirmordred.task_identities - INFO - Loading GrimoireLab identities in SortingHat
2021-01-26 10:08:39,363 - sirmordred.task_identities - INFO - [sortinghat] No changes in file /tmp/tmp2fi540uh, identities won't be loaded
2021-01-26 10:08:39,363 - sirmordred.task_identities - INFO - [sortinghat] End of loading identities from file /tmp/tmp2fi540uh
2021-01-26 10:08:57,782 - sirmordred.task_identities - INFO - [sortinghat] Unifying identities using algorithm email-name
2021-01-26 10:08:58,857 - sirmordred.task_identities - INFO - [sortinghat] Unifying identities using algorithm email
2021-01-26 10:08:59,908 - sirmordred.task_identities - INFO - [sortinghat] Unifying identities using algorithm github
2021-01-26 10:09:00,808 - sirmordred.task_identities - INFO - [sortinghat] Executing affiliate
2021-01-26 10:09:14,909 - sirmordred.task_identities - INFO - [sortinghat] Executing autoprofile for sources: ['git', 'github']
2021-01-26 10:09:26,468 - sirmordred.task_identities - INFO - [sortinghat] Autogender not configured. Skipping.
2021-01-26 10:09:26,468 - sirmordred.task_manager - INFO - [Global tasks] sleeping for 100 seconds
2021-01-26 10:11:07,568 - sirmordred.task_projects - INFO - Reading projects data from  /home/bitergia/conf/projects.json
2021-01-26 10:11:08,585 - sirmordred.task_identities - INFO - [sortinghat] No changes in file /home/bitergia/conf/organizations.json, organizations won't be loaded
2021-01-26 10:11:08,585 - sirmordred.task_identities - INFO - Loading GrimoireLab identities in SortingHat
2021-01-26 10:11:08,804 - sirmordred.task_identities - INFO - [sortinghat] No changes in file /tmp/tmpu78jd8n6, identities won't be loaded
2021-01-26 10:11:08,804 - sirmordred.task_identities - INFO - [sortinghat] End of loading identities from file /tmp/tmpu78jd8n6
2021-01-26 10:11:27,295 - sirmordred.task_identities - INFO - [sortinghat] Unifying identities using algorithm email-name
2021-01-26 10:11:28,358 - sirmordred.task_identities - INFO - [sortinghat] Unifying identities using algorithm email
2021-01-26 10:11:29,406 - sirmordred.task_identities - INFO - [sortinghat] Unifying identities using algorithm github
2021-01-26 10:11:30,282 - sirmordred.task_identities - INFO - [sortinghat] Executing affiliate
2021-01-26 10:11:44,080 - sirmordred.task_identities - INFO - [sortinghat] Executing autoprofile for sources: ['git', 'github']
2021-01-26 10:11:54,499 - sirmordred.task_identities - INFO - [sortinghat] Autogender not configured. Skipping.

Now I restarted the mordred process it started collecting data again.

2021-01-26 10:28:03,918 - grimoire_elk.elk - INFO - [git] Done collection for https://github.com/my-org/my-repo.git
2021-01-26 10:28:03,919 - sirmordred.task_collection - INFO - [git] collection finished for https://github.com/my-org/my-repo.git
2021-01-26 10:28:03,919 - sirmordred.task_collection - INFO - [git] collection starts for https://github.com/my-org/my-repo.git
2021-01-26 10:28:03,941 - grimoire_elk.raw.elastic - INFO - [git] Incremental from: 2020-09-29 20:13:53+00:00 for https://github.com/my-org/my-repo.git

Now waiting for this part of the process to complete. What should I grep for in my logs to find the studies?

marcofranssen avatar Jan 26 '21 10:01 marcofranssen

@zhquan Found following logs.

2021-01-26 11:11:05,769 - grimoire_elk.elk - ERROR - [git] Problem executing study enrich_areas_of_code:git, ConnectionError(HTTPConnectionPool(host='elasticsearch', port=9200): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f86e550fc18>: Failed to establish a new connection: [Errno 111] Connection refused'))) caused by: ConnectionError(HTTPConnectionPool(host='elasticsearch', port=9200): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f86e550fc18>: Failed to establish a new connection: [Errno 111] Connection refused')))
2021-01-26 11:11:05,769 - sirmordred.task_manager - ERROR - [git] Exception in Task Manager ConnectionError(HTTPConnectionPool(host='elasticsearch', port=9200): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f86e550fc18>: Failed to establish a new connection: [Errno 111] Connection refused'))) caused by: ConnectionError(HTTPConnectionPool(host='elasticsearch', port=9200): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f86e550fc18>: Failed to establish a new connection: [Errno 111] Connection refused')))
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/urllib3/connection.py", line 159, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/usr/local/lib/python3.7/dist-packages/urllib3/util/connection.py", line 80, in create_connection
    raise err
  File "/usr/local/lib/python3.7/dist-packages/urllib3/util/connection.py", line 70, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

It seems the port is hardcoded in the studies. As my elasticsearch endpoint runs on port 80 at http://elasticsearch via a proxy.

in my credentials.cfg I have the endpoints defined as following.

[es_collection]
url = https://admin:admin@elasticsearch

[es_enrichment]
url = https://admin:admin@elasticsearch

I would expect that studies use this endpoint as opposed to adding a port. I would like to stick with my traefik loadbalanced setup to spread the load on my elasticsearch nodes. A workarround would be to sent everything directly to a single node, but that is not a production like setup.

marcofranssen avatar Jan 26 '21 11:01 marcofranssen

Hi @marcofranssen

Sorry for the late reply. I lost this issue :(

The port is not hardcoded in the code by default elasticsearch uses the port 9200 https://github.com/chaoss/grimoirelab-elk/blob/master/grimoire_elk/enriched/git.py#L534.

Do you have enriched indexes? If you can create the enriched indexes you will have no issue creating studies indexes https://github.com/chaoss/grimoirelab-sirmordred/blob/master/sirmordred/task_enrich.py#L282

Could you try to set the port directly on the URL as https://admin:admin@elasticsearch:80 and try again?

Best, Quan

zhquan avatar Sep 20 '21 11:09 zhquan

Closing this due to no activity.

sduenas avatar Oct 27 '23 15:10 sduenas