gitinspector icon indicating copy to clipboard operation
gitinspector copied to clipboard

Add support for incremental statistics

Open GoogleCodeExporter opened this issue 9 years ago • 11 comments

Support incremental statistics on any statistics that can be incrementally 
collected. The consequence would be that all the statistical information would 
not have to be re-fetched each time gitinspector was executed.

Storing a hash from the options given and the calculated statistics should make 
it possible to also distinguish if the git history is changed up to a certain 
point and statistics need to be re-fetched anyway.

/Adam Waldenberg

Original issue reported on code.google.com by [email protected] on 15 Nov 2013 at 12:50

GoogleCodeExporter avatar Aug 23 '15 17:08 GoogleCodeExporter

Original comment by [email protected] on 15 Nov 2013 at 12:51

  • Changed state: Accepted

GoogleCodeExporter avatar Aug 23 '15 17:08 GoogleCodeExporter

Can this be connected to following problem?

After some massive changes in repo gitinspector consumes all the CPU with a lot of git blame ... and does not finish repo processing for at least 4 hours.

Can anything be tuned to speedup processing or at least to reduce CPU usage?

imposeren avatar Dec 16 '15 09:12 imposeren

Hi @imposeren.

Gitinspector is quite slow on very large repos with a big history, as it blames every single file.

This would partly solve it, yes. When this feature is implemented (assuming it's possible) it would mean gitinspector would not have to re-blame every single file each time you run it. Instead, it would only process the files that changed since last time, making it substantially less painful.

The only thing you can really do to speed up processing is to not use the "-H" (hard) option. If you were not using it - you are out of luck. The only option would be to optimize git itself :).

adam-waldenberg avatar Dec 16 '15 13:12 adam-waldenberg

Thanks for reply. Maybe there are some options for optimizing history?

Something like this: http://stevelorek.com/how-to-shrink-a-git-repository.html

But I do not know if removing unused files from history will affect gitinspector as it seems to operate only on existing files (cs this the correct?)

imposeren avatar Dec 18 '15 13:12 imposeren

@imposeren

Yes. Just running "git gc" will speed things up. Sometimes quite significantly. If you have never done it before, passing the --agressive switch might be a good idea. The following is from the git docs;

--aggressive
           Usually git gc runs very quickly while providing good disk space utilization and performance. This option will cause git gc to more
           aggressively optimize the repository at the expense of taking much more time. The effects of this optimization are persistent, so this option
           only needs to be used occasionally; every few hundred changesets or so.

I'm not sure how much the other stuff in that article will affect processing speed, but I guess it's always worth a try.

The blame section of gitinspector only operates on existing files, yes. However, with the -H flag, git still scans the whole history in order to be able to correctly blame each row to each author. So I guess even "git blame" should run faster. A blamed row can also, for example be from one of those big files so it still needs to take them into account, to some extent (even without -H passed to gitinspector).

Hard to say without a deeper investigation into the inner workings of git itself.

adam-waldenberg avatar Dec 18 '15 13:12 adam-waldenberg

@adam-waldenberg And one more question: does gitispector blame files excluded by '-x' option?

imposeren avatar Dec 18 '15 13:12 imposeren

@imposeren

No. It does not.

adam-waldenberg avatar Dec 18 '15 13:12 adam-waldenberg

@imposeren

Neither does it blame any files that have an invalid extension. Binary files are also skipped.

adam-waldenberg avatar Dec 18 '15 13:12 adam-waldenberg

is there any way to reduce concurrency of git blame? I can see up to 8 git blame processes when git inspector runs and each consumes 40-99% of processor core

imposeren avatar Dec 18 '15 14:12 imposeren

I can already see that there are no such options: https://github.com/ejwa/gitinspector/blob/master/gitinspector/blame.py#L31

I'll create separate issue for these and maybe will make a pull request later

imposeren avatar Dec 18 '15 14:12 imposeren

@imposeren

Gitinspector starts as many processes as there are threads/cores. There is no configuration option for it, and never will be. However, there is a constant at the top of changes.py and blame.py that controls the number of threads.

adam-waldenberg avatar Dec 18 '15 14:12 adam-waldenberg