[Bug][gitextractor] skip commit xxx because it has no parent commit
Search before asking
- [x] I had searched in the issues and found no similar issues.
What happened
Hello,
We have noticed an issue with the gitextractor plugin where some commits are being skipped, supposedly because there is no parent commit. However, we can see on GitHub that these commits do have parent commits.
This results in missing commits in the repo_commits, commits, and commit_parents tables, which are essential for refdiff to associate all PRs with our deployment events (we're using the webhook method), especially if the missing commit is the reference of our deployment event.
The behavior we're observing is causing mislinked commits and ultimately messing up our LTC metrics.
This issue was reproducible in multiple DevLake instances.
Logs:
2025-05-08 15:56:09 time="2025-05-08 14:56:09" level=info msg="[pipeline service] [pipeline #4] [task #24] [Clone Git Repo] skip commit fc5331abf6a85be3812a17843a6a5d95330ca7dc because it has no parent commit"
2025-05-08 15:56:09 time="2025-05-08 14:56:09" level=info msg="[pipeline service] [pipeline #4] [task #24] [Clone Git Repo] skip commit a9c9ad96860358ef6a1f32798d2a8456cbfc854a because it has no parent commit"
2025-05-08 15:56:09 time="2025-05-08 14:56:09" level=info msg="[pipeline service] [pipeline #4] [task #24] [Clone Git Repo] skip commit 1415b0bfa73946aac039282040dfb7c2100d9a8a because it has no parent commit"
2025-05-08 15:56:09 time="2025-05-08 14:56:09" level=info msg="[pipeline service] [pipeline #4] [task #24] [Clone Git Repo] skip commit 961c42266124b29836ebb085b20af1ce2b61f6d3 because it has no parent commit"
A similar issue has been reported here.
What do you expect to happen
The gitextractor should extract all commits and ingest them into the necessary tables.
How to reproduce
- Using v1.0.1@e061ef2
- Create a Project
- Add a data source
- Create a webhook
- Collect data from last 6 months
- Check the logs from the gitextractor plugin
Anything else
No response
Version
v1.0.1@e061ef2
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
Hello @luciofsl! I'm Dosu, a bot here to help you solve bugs, answer questions, and assist with becoming a contributor. I'm ready to help while you wait for a human maintainer.
The issue you're experiencing with the gitextractor plugin skipping commits is due to a condition in the code that checks for parent commits. If the first parent commit cannot be fetched from the object database (ODB), the commit is skipped. This typically occurs during a shallow clone for incremental collection, which is used to support the timeAfter parameter for fetching commits after a certain date [1].
Unfortunately, there doesn't seem to be a configuration or setting in the gitextractor plugin that can be adjusted to prevent this behavior [1]. You might need to consider modifying the code to handle these cases differently or explore alternative methods for data collection that ensure all parent commits are fetched.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
Please try the latest version, v1.0.2-beta8, and it should be fixed.
Hello,
We've tested the version devlake:v1.0.2-beta8 and we're still seeing the same issue. In the .env we have:
USE_GO_GIT_IN_GIT_EXTRACTOR=false
SKIP_COMMIT_STAT=false
SKIP_COMMIT_FILES=true
Logs:
2025-05-20 10:44:19 devlake-1 | time="2025-05-20 09:44:19" level=info msg=" [pipeline service] [pipeline #7] [task #37] [Clone Git Repo] skip commit 1469063ee73814ef30d14908d547b31acbe03dd3 because it has no parent commit"
2025-05-20 10:44:19 devlake-1 | time="2025-05-20 09:44:19" level=info msg=" [pipeline service] [pipeline #7] [task #37] [Clone Git Repo] skip commit b98fb6944cfb15125a5e4852e193c9d0191f7847 because it has no parent commit"
2025-05-20 10:44:19 devlake-1 | time="2025-05-20 09:44:19" level=info msg=" [pipeline service] [pipeline #7] [task #37] [Clone Git Repo] skip commit 4c8495f986a19788a668052f72aa8a353fd0aa9d because it has no parent commit"
2025-05-20 10:44:19 devlake-1 | time="2025-05-20 09:44:19" level=info msg=" [pipeline service] [pipeline #7] [task #37] [Clone Git Repo] skip commit c931e263f28e51294e3e1f8fe667ca3cf4b0d36d because it has no parent commit"
2025-05-20 10:44:19 devlake-1 | time="2025-05-20 09:44:19" level=info msg=" [pipeline service] [pipeline #7] [task #37] [Clone Git Repo] skip commit bb60b4d0d2a624137179d84cbcee8b954d1efb5f because it has no parent commit"
2025-05-20 10:44:19 devlake-1 | time="2025-05-20 09:44:19" level=info msg=" [pipeline service] [pipeline #7] [task #37] [Clone Git Repo] skip commit da41be18a614ac6f11e98000bbc305e1b85c8b37 because it has no parent commit"
@luciofsl Got it.
The message doesn’t indicate any issues — gitextractor collects a limited number of commits within the specified time range.
If the parent commits fall outside that range, they may be skipped.
You can try increasing the time range to include the commits you need.
This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.
This issue has been closed because it has been inactive for a long time. You can reopen it if you encounter the similar problem in the future.