incubator-devlake icon indicating copy to clipboard operation
incubator-devlake copied to clipboard

feat #6615, replace libgit2 with go-git.

Open d4x1 opened this issue 1 year ago • 6 comments
trafficstars

⚠️ Pre Checklist

Please complete ALL items in this checklist, and remove before submitting

  • [x] I have read through the Contributing Documentation.
  • [x] I have added relevant tests.
  • [x] I have added relevant documentation.
  • [x] I will add labels to the PR, such as pr-type/bug-fix, pr-type/feature-development, etc.

Summary

What does this PR do? In order to replace libgit2 with go-git, thir PR provides a new config USE_GO_GIT_IN_GIT_EXTRACTOR in env.sample, if USE_GO_GIT_IN_GIT_EXTRACTOR is set 1, in plugin GitExtractor, DevLake will use go-git to collect repo's data.

Does this close any open issues?

Closes #6615

Screenshots

Include any relevant screenshots here.

Other Information

Any other information that is important to this PR. In gitextractor, libgit2 will produce a table named commit_line_change, while go-git cannot fetch such information. commit_line_change is not used in any dashboards or processes, so go-git just ignore this tables.

d4x1 avatar Dec 28 '23 06:12 d4x1

My thoughts:

  • I think GitRepo should be turned into an interface with two impls: GoGitRepo and LibGitRepo. Refactor the appropriate methods. This creates better abstraction and less coupling in the code.
  • Write a temporary main function that uses LibGitRepo to grab data from a repo and perform the extraction functions. Capture the output in CSV files (like you're doing)
  • Write a test suite that uses GoGitRepo to run the same functions on the same repo as above. Test the same functions against the output CSV files from above (i.e. compare results).
  • Once all is good, get rid of the LibGitRepo impl and the temporary main function.

keon94 avatar Dec 31 '23 05:12 keon94

My thoughts:

* I think `GitRepo` should be turned into an interface with two impls: GoGitRepo and LibGitRepo. Refactor the appropriate methods. This creates better abstraction and less coupling in the code.

* Write a temporary main function that uses LibGitRepo to grab data from a repo and perform the extraction functions. Capture the output in CSV files (like you're doing)

* Write a test suite that uses GoGitRepo to run the same functions on the same repo as above. Test the same functions against the output CSV files from above (i.e. compare results).

* Once all is good, get rid of the LibGitRepo impl and the temporary main function.

Turning GItRepo into an interface is a good idea! I'll follow your advice.

d4x1 avatar Dec 31 '23 14:12 d4x1

@keon94 @klesh Please review this PR. I have updated the main part, and won't add new features.

d4x1 avatar Jan 26 '24 10:01 d4x1

I‘m glad the libgit2 dependency will be eliminated at last. It has caused too much inconvenience to the developers.

mindlesscloud avatar Jan 26 '24 15:01 mindlesscloud

I‘m glad the libgit2 dependency will be eliminated at last. It has caused too much inconvenience to the developers.

Yes, go-git doesn't equal with libgit2(commit_line_change cannot be collected with go-git). Hope go-git will satisfy DevLake's requirements.

d4x1 avatar Jan 27 '24 03:01 d4x1

@mindlesscloud Would you like to take a look at the PR when you find time?😊

klesh avatar Jan 29 '24 05:01 klesh

@mindlesscloud If there is no addtional comment, please approve this PR. Thx.

d4x1 avatar Feb 28 '24 08:02 d4x1