asv icon indicating copy to clipboard operation
asv copied to clipboard

Add an option for bypassing commit filtering for asv publish.

Open germanjoey opened this issue 7 years ago • 6 comments

Hello!

My team's been using asv for our project for the last couple of months and we really love it. However, we've noticed a problem where our graphs would occasionally lose all of their past data despite the results files still remaining present. Some investigation revealed the culprit: asv publish was filtering out benchmarks from commits that didn't appear via git rev-parse --first-parent branch. Our repo has a structure such that a condition like that doesn't make sense for us, so we'd rather just have all benchmarks shown.

As for the option's name, I went with "no-commit-filtering" to maintain the same naming scheme as "no-pull", but am happy to change it to whatever y'all think makes sense.

germanjoey avatar Oct 01 '18 04:10 germanjoey

The concept here is more about history "linearization" than "filtering". The graphs and analysis applied to them requires a linear ordering of commits, and there's probably no one best way to do this with significantly branched history.

I guess what you intend here is having Git.get_branch_commits run without the --first-parent flag --- instead of putting all commits to all branches as this PR seems to do now (which doesn't make sense to me)? Or is the repository without long-lived branches?

Ideally, I'd like to see a bit more thought out proposal here with some room left for the different ways one might like to do the linearization.

pv avatar Oct 03 '18 19:10 pv

Hello, I'm looking at the AppVeyor test failure but I don't quite understand it. It doesn't look like it's related to the code in this PR... can anyone confirm this? If there's some problem with the PR that I need to address, I'd like to get it taken care of. :)

germanjoey avatar Oct 03 '18 20:10 germanjoey

The appveyor one looks like unrelated maybe temporary conda failure, I don't understand it either.

pv avatar Oct 03 '18 21:10 pv

I guess what you intend here is having Git.get_branch_commits run without the --first-parent flag --- instead of putting all commits to all branches as this PR seems to do now (which doesn't make sense to me)?

Yeah, that's essentially what our goal was. Our thought was that "if we have a benchmark, we want it plotted" because we only had nightly benchmarks run against one particular branch. However, as you say, if a project had benchmarks on more than one branch, our feature certainly wouldn't be very useful!

Ideally, I'd like to see a bit more thought out proposal here with some room left for the different ways one might like to do the linearization.

Thinking about this issue a bit more after reading your comment, it does sound like the best way to address the problem of "what to plot when a project has various benchmark results in a highly branched repo" is to change the behavior of "get_hashes_from_range" and then git/mercurial deal with it.

I notice that both asv/plugins/git.py and asv/plugins/mercurial.pyhave hardcoded default arguments;--first-parentfor git, and{0}, -revfor mercurial. So, perhaps we could change the--no-commit-filteringflag to a--commit-filterargument that defaults toNone. For each branch in conf.branches, asv publish` would then:

a.) if --commit-filter is not specified, the respective defaults for each repo are used b.) otherwise, the value for --commit-filter is used instead.

If the loading results stage was split into two parts, so that commits could be reordered according to what was returned by get_branch_commits, this would even address the ordering problem that you mentioned as the user would then be able to change the sort order according to what flags were passed. (e.g. by passing --date-order or --author-date-order or --topo-order to git rev-list`).

What do you think?

germanjoey avatar Oct 03 '18 21:10 germanjoey

Yes, I'd prefer to do it in get_branch_commits or get_hashes_from_range.

As for the user interface, I'd prefer a flag along the lines of --commit-history=first-parent/date-order, ... --- or a config file option, as I think for a given repository you don't need to change the value of the option very often.

PR #751 seems to be trying to do the same thing as this...

pv avatar Oct 13 '18 15:10 pv

It is clearly a difficult problem to provide flexibility while ensuring a linear commit history. But what about relying on the user to provide arguments resulting in a linear history, with a warning in the API?

asv run

  --commit-history ARGS
                        Use this string of `git rev-list` arguments when deriving
                        the commit sequence from `range`. MUST result in a
                        LINEAR commit sequence, otherwise unpredictable
                        behaviour will follow.  Defaults to "--first-parent".

asv publish

  --commit-history ARGS
                        Use this string of `git rev-list` arguments when deriving
                        the commit sequence from `range`, and when finding
                        commits belonging to a published branch. MUST
                        result in a LINEAR commit sequence, otherwise
                        unpredictable behaviour will follow.  Defaults to
                        "--first-parent".

There may be other relevant asv ... commands too, I'm not sure.

trexfeathers avatar Aug 27 '21 10:08 trexfeathers