repoxplorer
repoxplorer copied to clipboard
Weird results after adding `--all-branches` feature to repoxplorer-github-organization
As discussed in #62, I have added an --all-branches
options to repoxplorer-github-organization
. The good news is that it is now picking up all the branches for both organizations that I'm indexing.
However the number of repo refs for one of the organizations is inflated and for the other, it's not showing any stats at all.
I don't even know where to start with troubeshooting this.
Any ideas appreciated.
Have attached the modified version of repoxplorer-github-organization
Hi, could you open a Pull Request ? It will be easier this way.
See the PR #64
Okay, I saw your comment on the pull request.
Will try your suggested changes.
Have updated the PR.
So to summarize what I'm seeing. As you said in the PR, the generated yaml looks fine. I'm happy with it.
The issue seems to be what is done with the resulting yaml.
I'm indexing two organizations and I'm seeing differences in how they are being displayed that goes beyond what I would expect when including all branches.
First organization
Production
My production instance of repoXplorer is running the old code unchanged, so only including a single branch. For the first organization, it reports the following: Repository refs: 101 Commits: 11,521 Lines changed: 23,000,866 Authors: 77 Activity duration: 1,313 days
Test
On the test instance, running the --all-branches code
, I'm seeing the following for the first organization:
Repository refs: 624
Commits: 12,791
Lines changed: 35,875,670
Authors: 81
Activity duration: 1,315 days
I don't understand why the repo refs is six times greater on the test instance. That doesn't seem right to me.
Second organization
Production
Regarding the second organization on the production instance, it reports the following: Repository refs: 125 Commits: 26,968 Lines changed: 62,033,693 Authors: 140 Activity duration: 1,383 days
Test
However on the test instance, it's not showing anything at all for the second organization!
To be more precise, on the relevant project page, it's showing the template with the headings for Commits history, Contributor history and Project commits but no data and no summaries at all.
However if I go to the Projects page anc click on the dropdown menu for the second org, it lists all the repos and branches for the second org, so it's clearly reading the generated yaml correctly!
It's probably worth noting that the generated yaml file for the second org is over twice the size of that for the first.
Hi Mamading,
Thanks for the info, first for the second organization please check the indexer process logs, it might be stuck. For the first organization, it is really difficult for me (w/o access to the org details) to tell you if the number of refs you are seeing is ok or not. Sometimes folk push branches in repo (w/o using fork) to create PRs so having 624 branches across 100 repos does not seem crazy to me. Having said that, w/o the generated file or even access to the org I won't be able to provide more insight. Is the orgs public ?
Fabien
On Fri, Oct 9, 2020 at 11:37 AM Mamading Ceesay [email protected] wrote:
It's probably worth noting that the generated yaml file for the second org is twice the size of that for the first.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/morucci/repoxplorer/issues/63#issuecomment-706130270, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYM3AASOLLKHCGIT4BCC3LSJ3YWTANCNFSM4SBTYKYQ .
Okay, point taken regarding repo refs.
Have fired up a local instance and am watching the indexer logs to see what's going on.
The second org is https://github.com/holochain/ and I believe everything there should be public.
Okay I'm seeing stuff like this in the indexer log:
2020-10-12 12:59:10,531 DEBUG [repoxplorer.indexer.git.indexer] b'warning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 823 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 823 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 823 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 823 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 823 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 823 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 823 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 823 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 823 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 823 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2439 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2439 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2439 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2439 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2439 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2439 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2439 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2439 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2439 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2439 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2439 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2439 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2439 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2439 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2700 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2700 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2700 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2700 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2700 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2700 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2700 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2700 and retry the command.\nwarning: inexact rename detection was skipped due to too many files.\nwarning: you may want to set your diff.renameLimit variable to at least 2700 and retry the command.\nfatal: Out of memory, malloc failed (tried to allocate 11059857 bytes)\n'
2020-10-12 12:59:10,747 WARNI [indexerDaemon] Unable to index repository branch http://github.com/holochain/docs-pages:docs-pages:develop:
Also seeing stuff like this:
exited with code 128
2020-10-12 13:35:53,940 ERROR [indexerDaemon] Exception is:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib64/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/usr/lib64/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/usr/local/lib/python3.7/site-packages/repoxplorer/indexer/git/indexer.py", line 279, in process_commits
buf = get_commits_desc(path, shas)
File "/usr/local/lib/python3.7/site-packages/repoxplorer/indexer/git/indexer.py", line 125, in get_commits_desc
out = run(cmd, path)
File "/usr/local/lib/python3.7/site-packages/repoxplorer/indexer/git/indexer.py", line 74, in run
raise Exception('%s exited with code %s' % (cmd, process.returncode))
Exception: ['git', 'show', '--format=raw', '--numstat',
So I'm in the process of bumping up the amount of ram in the virtual machine that the container is running on locally.
If I can determine how much is needed to stop the out of memory crashes I can then do the same for the deployment of the container.
Upping the memory is definitely stopping the crashes. However since more branches are being indexed, the initial indexing is taking a lot longer to run to completion and hasn't finished yet!