incubator-devlake icon indicating copy to clipboard operation
incubator-devlake copied to clipboard

[Bug][gitextractor] skip commit xxx because it has no parent commit

Open luciofsl opened this issue 10 months ago • 4 comments

Search before asking

  • [x] I had searched in the issues and found no similar issues.

What happened

Hello,

We have noticed an issue with the gitextractor plugin where some commits are being skipped, supposedly because there is no parent commit. However, we can see on GitHub that these commits do have parent commits.

This results in missing commits in the repo_commits, commits, and commit_parents tables, which are essential for refdiff to associate all PRs with our deployment events (we're using the webhook method), especially if the missing commit is the reference of our deployment event.

The behavior we're observing is causing mislinked commits and ultimately messing up our LTC metrics.

This issue was reproducible in multiple DevLake instances.

Logs:

2025-05-08 15:56:09 time="2025-05-08 14:56:09" level=info msg="[pipeline service] [pipeline #4] [task #24] [Clone Git Repo] skip commit fc5331abf6a85be3812a17843a6a5d95330ca7dc because it has no parent commit"
2025-05-08 15:56:09 time="2025-05-08 14:56:09" level=info msg="[pipeline service] [pipeline #4] [task #24] [Clone Git Repo] skip commit a9c9ad96860358ef6a1f32798d2a8456cbfc854a because it has no parent commit"
2025-05-08 15:56:09 time="2025-05-08 14:56:09" level=info msg="[pipeline service] [pipeline #4] [task #24] [Clone Git Repo] skip commit 1415b0bfa73946aac039282040dfb7c2100d9a8a because it has no parent commit"
2025-05-08 15:56:09 time="2025-05-08 14:56:09" level=info msg="[pipeline service] [pipeline #4] [task #24] [Clone Git Repo] skip commit 961c42266124b29836ebb085b20af1ce2b61f6d3 because it has no parent commit"

A similar issue has been reported here.

What do you expect to happen

The gitextractor should extract all commits and ingest them into the necessary tables.

How to reproduce

  1. Using v1.0.1@e061ef2
  2. Create a Project
  3. Add a data source
  4. Create a webhook
  5. Collect data from last 6 months
  6. Check the logs from the gitextractor plugin

Anything else

No response

Version

v1.0.1@e061ef2

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

luciofsl avatar May 08 '25 15:05 luciofsl

Hello @luciofsl! I'm Dosu, a bot here to help you solve bugs, answer questions, and assist with becoming a contributor. I'm ready to help while you wait for a human maintainer.

The issue you're experiencing with the gitextractor plugin skipping commits is due to a condition in the code that checks for parent commits. If the first parent commit cannot be fetched from the object database (ODB), the commit is skipped. This typically occurs during a shallow clone for incremental collection, which is used to support the timeAfter parameter for fetching commits after a certain date [1].

Unfortunately, there doesn't seem to be a configuration or setting in the gitextractor plugin that can be adjusted to prevent this behavior [1]. You might need to consider modifying the code to handle these cases differently or explore alternative methods for data collection that ensure all parent commits are fetched.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar May 08 '25 15:05 dosubot[bot]

Please try the latest version, v1.0.2-beta8, and it should be fixed.

klesh avatar May 13 '25 03:05 klesh

Hello,

We've tested the version devlake:v1.0.2-beta8 and we're still seeing the same issue. In the .env we have:

USE_GO_GIT_IN_GIT_EXTRACTOR=false
SKIP_COMMIT_STAT=false
SKIP_COMMIT_FILES=true

Logs:

2025-05-20 10:44:19 devlake-1    | time="2025-05-20 09:44:19" level=info msg=" [pipeline service] [pipeline #7] [task #37] [Clone Git Repo] skip commit 1469063ee73814ef30d14908d547b31acbe03dd3 because it has no parent commit"
2025-05-20 10:44:19 devlake-1    | time="2025-05-20 09:44:19" level=info msg=" [pipeline service] [pipeline #7] [task #37] [Clone Git Repo] skip commit b98fb6944cfb15125a5e4852e193c9d0191f7847 because it has no parent commit"
2025-05-20 10:44:19 devlake-1    | time="2025-05-20 09:44:19" level=info msg=" [pipeline service] [pipeline #7] [task #37] [Clone Git Repo] skip commit 4c8495f986a19788a668052f72aa8a353fd0aa9d because it has no parent commit"
2025-05-20 10:44:19 devlake-1    | time="2025-05-20 09:44:19" level=info msg=" [pipeline service] [pipeline #7] [task #37] [Clone Git Repo] skip commit c931e263f28e51294e3e1f8fe667ca3cf4b0d36d because it has no parent commit"
2025-05-20 10:44:19 devlake-1    | time="2025-05-20 09:44:19" level=info msg=" [pipeline service] [pipeline #7] [task #37] [Clone Git Repo] skip commit bb60b4d0d2a624137179d84cbcee8b954d1efb5f because it has no parent commit"
2025-05-20 10:44:19 devlake-1    | time="2025-05-20 09:44:19" level=info msg=" [pipeline service] [pipeline #7] [task #37] [Clone Git Repo] skip commit da41be18a614ac6f11e98000bbc305e1b85c8b37 because it has no parent commit"

luciofsl avatar May 20 '25 10:05 luciofsl

@luciofsl Got it. The message doesn’t indicate any issues — gitextractor collects a limited number of commits within the specified time range. If the parent commits fall outside that range, they may be skipped. You can try increasing the time range to include the commits you need.

klesh avatar May 21 '25 06:05 klesh

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar Jul 21 '25 00:07 github-actions[bot]

This issue has been closed because it has been inactive for a long time. You can reopen it if you encounter the similar problem in the future.

github-actions[bot] avatar Jul 28 '25 00:07 github-actions[bot]