knowledge-repo icon indicating copy to clipboard operation
knowledge-repo copied to clipboard

Instability with git commit parsing

Open selwyth opened this issue 8 years ago • 4 comments

Auto-reviewers: @NiharikaRay @matthewwardrop @earthmancash @danfrankj

Hello! I got knowledge-repo deployed successfully onto internal hosts via uwsgi. I've had a troubling problem though where things will look good for 5-10 minutes, then the server will start spitting out error logs like the following, and in increasing frequency. I'm unable to reproduce this when messing around in the flask shell, and would appreciate any tips you have here.

Traceback (most recent call last):
  File "/$PATH/lib/python2.7/site-packages/flask/app.py", line 1997, in __call__
    return self.wsgi_app(environ, start_response)
  File "/$PATH/lib/python2.7/site-packages/flask/app.py", line 1985, in wsgi_app
    response = self.handle_exception(e)
  File "/$PATH/lib/python2.7/site-packages/flask/app.py", line 1540, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/$PATH/lib/python2.7/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/$PATH/lib/python2.7/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/$PATH/lib/python2.7/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/$PATH/lib/python2.7/site-packages/flask/app.py", line 1610, in full_dispatch_request
    rv = self.preprocess_request()
  File "/$PATH/lib/python2.7/site-packages/flask/app.py", line 1831, in preprocess_request
    rv = func()
  File "/$PATH/lib/python2.7/site-packages/knowledge_repo/app/app.py", line 129, in update_index_if_required
    update_index()
  File "/$PATH/lib/python2.7/site-packages/knowledge_repo/app/index.py", line 82, in update_index
    if not update_index_required(check_timeouts=check_timeouts):
  File "/$PATH/lib/python2.7/site-packages/knowledge_repo/app/index.py", line 67, in update_index_required
    for uri, revision in current_repo.revisions.items():
  File "/$PATH/lib/python2.7/site-packages/werkzeug/local.py", line 347, in __getattr__
    return getattr(self._get_current_object(), name)
  File "/$PATH/lib/python2.7/site-packages/knowledge_repo/repository.py", line 127, in revisions
    return {self.uri: self.revision}
  File "/$PATH/lib/python2.7/site-packages/knowledge_repo/repositories/gitrepository.py", line 119, in revision
    return "{}_{}".format(str(c.committed_date), c.hexsha)
  File "/$PATH/lib/python2.7/site-packages/gitdb/util.py", line 256, in __getattr__
    self._set_cache_(attr)
  File "/$PATH/lib/python2.7/site-packages/git/objects/commit.py", line 144, in _set_cache_
    self._deserialize(BytesIO(stream.read()))
  File "/$PATH/lib/python2.7/site-packages/git/objects/commit.py", line 451, in _deserialize
    self.tree = Tree(self.repo, hex_to_bin(readline().split()[1]), Tree.tree_id << 12, '')
TypeError: Odd-length string

Some trouble with git, and following the trail got me to some 'cache if new, retrieve if not new' logic which seems to fail when retrieving.

For reference, here's the latest git commit I have:

commit cd442315c9ab6beadedd556e2ddc99fec3710e99
Author: Wesley Wisdom <[email protected]>
Date:   Tue Jun 13 10:46:15 2017 -0700

    correct author on example

This results in consecutive errors of the following (all with similar breadcrumbs as above):

ValueError: SHA cd442315c9ab6beadedd556e2ddc99fec3710e99 could not be resolved, git returned: 'cd442315c9ab6beadedd556e2ddc99fec3710e99 3539a5e90b3e'
ValueError: SHA parent could not be resolved, git returned: 'parent 009846b9573e217e6acd13ba158d56841195522e'
ValueError: SHA author could not be resolved, git returned: 'author Wesley Wisdom <[email protected]> 1497375975 -0700'
ValueError: SHA committer could not be resolved, git returned: 'committer Wesley Wisdom <[email protected]> 1497375975 -0700'
ValueError: SHA could not be resolved, git returned: ''
ValueError: Failed to parse header: '> 1497375975 -0700\n'
ValueError: SHA Add could not be resolved, git returned: 'correct author on example'
ValueError: SHA could not be resolved, git returned: ''
AssertionError: Require 20 byte binary sha, got '\x00\x15\x8dV\x84\x11\x95R.', len = 9
ValueError: SHA Add could not be resolved, git returned: 'correct author on example'
ValueError: SHA avid could not be resolved, git returned: 'esley Wisdom <[email protected]> 1497375975 -0700'
ValueError: Failed to parse header: '6e2ddc99fec3710e99 commit 238\n'

Hope you have ideas for how I can approach this! Excited to serve knowledge-repo to our company, but I have to make it stay up for more than 10 minutes at a time first.

selwyth avatar Jun 14 '17 01:06 selwyth

Hi @selwyth !

Thanks for reporting this. We haven't seen these issues locally yet, so thanks for bringing it to your attention.

My first naive guess is that your version of the python library gitdb is somehow out of sync with the db format used by the version of git on your machine... but I'm not aware of any recent changes in that arena.

To help us reproduce it, can you provide the following details?

  • your operating system
  • the version of Python being used
  • the version the knowledge repo
  • the version of the git binary
  • the version of the gitdb package

Once I get this information, I'll take a look at it for you :).

matthewwardrop avatar Jun 16 '17 04:06 matthewwardrop

OS: CentOS 6.5 Python: 2.7.11 knowledge_repo: 0.7.6 git binary: 1.8.4 gitdb2: 2.0.2

Thanks for offering to take a look, and I hope this helps. I've confirmed it's due to reindexing; no errors if I turn it off in the server_config.py.

Also looks fine when I try to reproduce in a Python shell, hence I'm baffled.

from knowledge_repo import KnowledgeRepository

repo = KnowledgeRepository.for_uri('knowledge-data')
repo.revision

>>> '1497382362_81157ad9830fc00d401869a9b3b9d8681244530a'

selwyth avatar Jun 19 '17 08:06 selwyth

@matthewwardrop I am getting a similar error in one of my knowledge repo. Basically this error happens when we create knowledge repo through latest code.

INFO:knowledge_repo.repositories.gitrepository:Fetching updates to the knowledge repository...
Process Process-1:16:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/.virtualenvs/knowledge_repo_jan/local/lib/python2.7/site-packages/knowledge_repo/app/index.py", line 43, in index_sync_loop
    update_index(check_timeouts=False)
  File "/home/ubuntu/.virtualenvs/knowledge_repo_jan/local/lib/python2.7/site-packages/knowledge_repo/app/models.py", line 143, in wrapped
    raise_with_traceback(e)
  File "/home/ubuntu/.virtualenvs/knowledge_repo_jan/local/lib/python2.7/site-packages/knowledge_repo/app/models.py", line 138, in wrapped
    return function(*args, **kwargs)
  File "/home/ubuntu/.virtualenvs/knowledge_repo_jan/local/lib/python2.7/site-packages/knowledge_repo/app/index.py", line 141, in update_index
    current_repo.update()
  File "/home/ubuntu/.virtualenvs/knowledge_repo_jan/local/lib/python2.7/site-packages/knowledge_repo/repositories/gitrepository.py", line 139, in update
    for submodule in self.git.submodules:
  File "/home/ubuntu/.virtualenvs/knowledge_repo_jan/local/lib/python2.7/site-packages/git/repo/base.py", line 317, in submodules
    return Submodule.list_items(self)
  File "/home/ubuntu/.virtualenvs/knowledge_repo_jan/local/lib/python2.7/site-packages/git/util.py", line 937, in list_items
    out_list.extend(cls.iter_items(repo, *args, **kwargs))
  File "/home/ubuntu/.virtualenvs/knowledge_repo_jan/local/lib/python2.7/site-packages/git/objects/submodule/base.py", line 1162, in iter_items
    pc = repo.commit(parent_commit)         # parent commit instance
  File "/home/ubuntu/.virtualenvs/knowledge_repo_jan/local/lib/python2.7/site-packages/git/repo/base.py", line 460, in commit
    return self.rev_parse(text_type(rev) + "^0")
  File "/home/ubuntu/.virtualenvs/knowledge_repo_jan/local/lib/python2.7/site-packages/git/repo/fun.py", line 213, in rev_parse
    obj = name_to_object(repo, rev[:start])
  File "/home/ubuntu/.virtualenvs/knowledge_repo_jan/local/lib/python2.7/site-packages/git/repo/fun.py", line 150, in name_to_object
    return Object.new_from_sha(repo, hex_to_bin(hexsha))
  File "/home/ubuntu/.virtualenvs/knowledge_repo_jan/local/lib/python2.7/site-packages/git/objects/base.py", line 64, in new_from_sha
    oinfo = repo.odb.info(sha1)
  File "/home/ubuntu/.virtualenvs/knowledge_repo_jan/local/lib/python2.7/site-packages/git/db.py", line 37, in info
    hexsha, typename, size = self._git.get_object_header(bin_to_hex(sha))
  File "/home/ubuntu/.virtualenvs/knowledge_repo_jan/local/lib/python2.7/site-packages/git/cmd.py", line 1073, in get_object_header
    return self.__get_object_header(cmd, ref)
  File "/home/ubuntu/.virtualenvs/knowledge_repo_jan/local/lib/python2.7/site-packages/git/cmd.py", line 1062, in __get_object_header
    return self._parse_object_header(cmd.stdout.readline())
  File "/home/ubuntu/.virtualenvs/knowledge_repo_jan/local/lib/python2.7/site-packages/git/cmd.py", line 1026, in _parse_object_header
    raise ValueError("SHA %s could not be resolved, git returned: %r" % (tokens[0], header_line.strip()))
ValueError: SHA 0f7429699cc69e76f911f7fc0daceb1a3b8b1217 could not be resolved, git returned: '0f7429699cc69e76f911f7fc0daceb1a3b8b1217 missing'

shashj199 avatar Feb 02 '18 09:02 shashj199

FYI I worked around this by setting REPOSITORY_INDEXING_ENABLED to False and patching 2 of the 4 areas which check for the flag to not check it (because things like pageviews don't work if REPOSITORY_INDEXING_ENABLED is set to False. Would love to know the solution though.

selwyth avatar Feb 27 '18 23:02 selwyth