scikit-posthocs icon indicating copy to clipboard operation
scikit-posthocs copied to clipboard

Fix overlapping crossbars

Open pedroilidio opened this issue 8 months ago • 9 comments

Three main points are relevant in this PR:

  1. It fixes #86.

  2. I also modified the sign_array function, since I was again running into #82. The modification dispenses the epsilon added by #81 and discussed in #82. Please let me know if this is incorrect. I made sure the copy problem pointed out by #77 remains solved.

  3. To ensure these problems do not come up in the future I added unit tests based on #86 and #82.

pedroilidio avatar Apr 15 '25 17:04 pedroilidio

Some further details about the solution. The goal of the function is to plot in different levels the crossbars that intersect with each other. The problem was caused by how we were calculating the intersections between crossbars (merely saying that they intersect if they have a point in common):

https://github.com/maximtrp/scikit-posthocs/blob/98665e2ed53ba1cac2fd77eb3d171695161819a9/scikit_posthocs/_plotting.py#L537

Now, the function considers that two crossbars intersect if their intervals between their minimum and maximum ranks intersect (the ranks are the positions ploted in the x axis). The line below evaluates if two crossbars do not intersect:

https://github.com/maximtrp/scikit-posthocs/blob/db276c77cc60ac02bd08ac503b4a63cb54474ac8/scikit_posthocs/_plotting.py#L541

Furthermore, I replaced the iterative algorithm to find the crossbar levels:

https://github.com/maximtrp/scikit-posthocs/blob/98665e2ed53ba1cac2fd77eb3d171695161819a9/scikit_posthocs/_plotting.py#L531-L543

The function now generates a crossbar-crossbar adjacency matrix called on_same_level and uses the already implemented _find_maximal_cliques function to include as many bars as possible in each level:

https://github.com/maximtrp/scikit-posthocs/blob/db276c77cc60ac02bd08ac503b4a63cb54474ac8/scikit_posthocs/_plotting.py#L544-L545

Given n_bars crossbars, on_same_level is a n_bars by n_bars binary matrix containing 0 if the crossbars overlap and 1 otherwise. 1 thus means the crossbars can be plotted in the same level.

Edits: Adjust lines displayed.

pedroilidio avatar Apr 15 '25 17:04 pedroilidio

Thank you! I'm gonna review all the related code in your pull request and this package

maximtrp avatar Apr 16 '25 10:04 maximtrp

Wait, I've found two other bugs in this implementation. I'll check if they are also in the master branch, open issues if they do, and add the fixes to this PR.

Bug 1: Crossbars are repeated Bug 2: the points in the crossbar are not being plotted in order, which causes lines to go back and forth sometimes (apparent when linestyle is not contiguous and/or not fully opaque).

test

import numpy as np
import matplotlib.pyplot as plt
from pandas import DataFrame, Series

from scikit_posthocs import critical_difference_diagram

n = 10
rng = np.random.default_rng(0)
rank_values = rng.uniform(size=n)
index = np.arange(n).astype(str)

ranks = Series(rank_values, index=index)

sig_array = np.zeros((n, n), dtype=bool)
values = rng.integers(0, 2, size=(n * (n - 1)) // 2, dtype=bool)

triu = np.triu_indices(n, k=1)
sig_array[triu] = sig_array.T[triu] = values

sig_matrix = DataFrame(sig_array, index=index, columns=index)
output = critical_difference_diagram(
    ranks, sig_matrix, crossbar_props={"marker": "s", "linestyle": ":", "alpha": .5}
)
plt.savefig("test.png")

pedroilidio avatar Apr 17 '25 12:04 pedroilidio

@pedroilidio Yeah, thank you! I was about to write that we need as many tests as possible. Can this functionality be covered by them more carefully?

maximtrp avatar Apr 17 '25 12:04 maximtrp

Indeed, @maximtrp, I'll add as many tests for the CDDs as I can :)

pedroilidio avatar Apr 17 '25 15:04 pedroilidio

I apologize, I was quite wrong here:

Furthermore, I replaced the iterative algorithm to find the crossbar levels: [...]

The maximal cliques approach does not ensure different crossbars at each level. This is more complicated: it's a graph coloring problem in the complement graph of the crossbar graph (on_same_level) (see this SE answer).

I'll revert back to the greedy approach we were using. Another (maybe overkill) option: I could adapt the implementation of NetworkX's approach in a separate module, avoiding to add NetworkX as a new dependency. I could do the same for the maximal cliques function I needed to implement for the CDDs.

pedroilidio avatar Apr 17 '25 17:04 pedroilidio

It looks better now, but it suspiciously changed from my previous figure (pairs of points are gone). I will write several tests in a further commit.

fixed

import numpy as np
import matplotlib.pyplot as plt
from pandas import DataFrame, Series

from scikit_posthocs import critical_difference_diagram

n = 10
rng = np.random.default_rng(0)
rank_values = rng.uniform(size=n)
index = np.arange(n).astype(str)
ranks = Series(rank_values, index=index)

triu = np.triu_indices(n, k=1)
values = rng.integers(0, 2, size=len(triu[0]), dtype=np.int8)
sig_array = np.ones((n, n), dtype=np.int8)
sig_array[triu] = sig_array.T[triu] = values

sig_matrix = DataFrame(sig_array, index=index, columns=index)

plt.figure(figsize=(5, 3))
output = critical_difference_diagram(
    ranks, sig_matrix, crossbar_props={"marker": "s", "linestyle": ":", "alpha": .3}
)
plt.tight_layout()
plt.savefig("fixed.png")

pedroilidio avatar Apr 17 '25 17:04 pedroilidio

Thank you once again! Now not all tests are passed 👇

maximtrp avatar Apr 18 '25 06:04 maximtrp

Thanks for following this thread, @maximtrp! I'll be busier for some days, so I'll get back to working on this at the end of the next week. For now I'll convert the PR to a draft PR until it's ready for merging. Next steps are:

  • [ ] Solve failing tests.
  • [ ] Add way more unit tests, e.g.:
    • Assert ranks are ordered in each crossbar (this fixes #89).
    • Assert crossbars do not ever overlap.
    • Assert maximal cliques function is properly working (maybe compare several random cases with NetworkX results)
    • etc.
  • [ ] Decide: for the greedy placement of crossbars in the levels, should we prioritize wider bars, left-most bars, or maybe a mixture of both? (I'm talking specifically about the following line of code) https://github.com/maximtrp/scikit-posthocs/blob/a8d50cb03eda1f97c40e823ac84f30b097933dc8/scikit_posthocs/_plotting.py#L531

pedroilidio avatar Apr 18 '25 11:04 pedroilidio