[Bug][Gitlab] v1.0.1-beta2 Data Time Range not working
Search before asking
- [X] I had searched in the issues and found no similar issues.
What happened
Gitlab, Data Time Range, v1.0.1-beta1 is working , but v1.0.1-beta2 not working.
What do you expect to happen
v1.0.1-beta2 will collect all commits
How to reproduce
1、Upgrade to version v1.0.1-beta2 2、Set Time Range to 2024-06-01 3、Collect Data 4、Select commits table
Anything else
No response
Version
v1.0.1-beta2
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
@liucyyy Hello there! 👋 I'm here to help you with any bugs, questions, or contributions you have for the project. Let me know how I can assist you!
The issue with the Data Time Range feature in version v1.0.1-beta2 of GitLab is likely due to the default setting, which is configured to collect data from the past six months. To resolve this and ensure that all commits within the specified time range are collected, you need to adjust the Data Time Range setting in the Sync Policy.
Here is how you can do it:
- Navigate to the Sync Policy settings.
- Adjust the Data Time Range to cover the desired period for which you want to collect commits.
### Step 2.3 - Set the Sync Policy
There are three settings for Sync Policy:
- Data Time Range: You can select the time range of the data you wish to collect. The default is set to the past six months.
- Sync Frequency: You can choose how often you would like to sync your data in this step by selecting a sync frequency option or entering a cron code to specify your preferred schedule.
- Skip Failed Tasks: sometimes a few tasks may fail in a long pipeline; you can choose to skip them to avoid spending more time in running the pipeline all over again.
By adjusting the Data Time Range setting, you can ensure that all commits within the specified time range are collected [1].
If the problem persists, you should capture a screenshot of any error messages and download the logs from the pipeline for further analysis. These logs and error messages can provide more insight into why the feature is not working [2].
To continue talking to Dosu, mention @dosu.
try to perform a full refresh and see if the problem goes away.
@klesh The problem still exists. Data before June 2024 will still be collected.
Sorry? what do you mean by "Data before June 2024 will still be collected."? The screenshot you posted showed commits from 2024-09-17 to 18, I thought you were talking about commits between 2024-06-01 and 2024-09-17 were not collected.
Are you expecting data in the database to be deleted? if so, that is not supported at the moment, commits will never be deleted once collected.
Are you two working together? 😂
@klesh Sorry, I may not have made it clear. What I meant is that if the Time Range value is set to 2024-06-01, the commit data before 2024-06-01 will not be collected, but the v1.0.1-beta2 version will also collect the commit data before 2024-06-01.
@klesh I changed my account name this afternoon..😂
I redeployed devlake, added the gitlab project, set the Time Range to 2024-06-01, and then clicked the collect data button. The commits before 2024-06-01 will also be collected (I don’t want the commits to be collected).
Please take a closer look at the screenshot of the database table. The committed_date is 2014, not 2024.
Aah, sorry, I see what you meant now.
Can you check the _raw_data_table and see if those commits were collected by the gitextractor plugin.
Aah, sorry, I see what you meant now. Can you check the
_raw_data_tableand see if those commits were collected by thegitextractorplugin. @klesh Yes, using the gitextractor plugin
After reviewing the details, I can confirm that the functionality is performing as expected. The purpose of this PR (https://github.com/apache/incubator-devlake/pull/7727) was to address the issue where new commits from non-default branches were not being collected. The implemented solution involves fetching all branches from the remote server—this includes fetching at least some old commits from all branches—before executing the shallow fetch. Therefore, it's likely that the old commits are originating from those branches that have not been deleted.
After reviewing the details, I can confirm that the functionality is performing as expected. The purpose of this PR (#7727) was to address the issue where new commits from non-default branches were not being collected. The implemented solution involves fetching all branches from the remote server—this includes fetching at least some old commits from all branches—before executing the shallow fetch. Therefore, it's likely that the old commits are originating from those branches that have not been deleted.
@klesh Ok, many thanks for your help!
@klesh Can you help me create v1.0.1-beta4 based on the main branch? Let me test this function.
You can try the latest release: https://github.com/apache/incubator-devlake/releases
7727
@d4x1 The latest release does not contain this commit , refactor: fetch branches before shallow fetch to reduce the total commits collected
@abeizn Hello, please ask why this commit was not released to v1.0.1-beta4. I want to test this commit. https://github.com/apache/incubator-devlake/pull/7760
@xlqbyy Note that #7760 doesn't fix the issue but merely reduces the total number of collected commits.
@xlqbyy I just CPed #7760 back to v1.0, will release a new beta this week.
@xlqbyy I just CPed #7760 back to v1.0, will release a new beta this week.
Okay, thank you for your answer. @klesh
After reviewing the details, I can confirm that the functionality is performing as expected. The purpose of this PR (#7727) was to address the issue where new commits from non-default branches were not being collected. The implemented solution involves fetching all branches from the remote server—this includes fetching at least some old commits from all branches—before executing the shallow fetch. Therefore, it's likely that the old commits are originating from those branches that have not been deleted.
@xlqbyy As discussed previously, it is an unintended side-effect of PR #7727. While it would be ideal to avoid this behavior, we haven't found a viable solution yet. Therefore, it's unlikely to be fixed anytime soon unless a solution to the underlying issue emerges.