Fix overlapping crossbars
Three main points are relevant in this PR:
-
It fixes #86.
-
I also modified the
sign_arrayfunction, since I was again running into #82. The modification dispenses the epsilon added by #81 and discussed in #82. Please let me know if this is incorrect. I made sure the copy problem pointed out by #77 remains solved. -
To ensure these problems do not come up in the future I added unit tests based on #86 and #82.
Some further details about the solution. The goal of the function is to plot in different levels the crossbars that intersect with each other. The problem was caused by how we were calculating the intersections between crossbars (merely saying that they intersect if they have a point in common):
https://github.com/maximtrp/scikit-posthocs/blob/98665e2ed53ba1cac2fd77eb3d171695161819a9/scikit_posthocs/_plotting.py#L537
Now, the function considers that two crossbars intersect if their intervals between their minimum and maximum ranks intersect (the ranks are the positions ploted in the x axis). The line below evaluates if two crossbars do not intersect:
https://github.com/maximtrp/scikit-posthocs/blob/db276c77cc60ac02bd08ac503b4a63cb54474ac8/scikit_posthocs/_plotting.py#L541
Furthermore, I replaced the iterative algorithm to find the crossbar levels:
https://github.com/maximtrp/scikit-posthocs/blob/98665e2ed53ba1cac2fd77eb3d171695161819a9/scikit_posthocs/_plotting.py#L531-L543
The function now generates a crossbar-crossbar adjacency matrix called on_same_level and uses the already implemented _find_maximal_cliques function to include as many bars as possible in each level:
https://github.com/maximtrp/scikit-posthocs/blob/db276c77cc60ac02bd08ac503b4a63cb54474ac8/scikit_posthocs/_plotting.py#L544-L545
Given n_bars crossbars, on_same_level is a n_bars by n_bars binary matrix containing 0 if the crossbars overlap and 1 otherwise. 1 thus means the crossbars can be plotted in the same level.
Edits: Adjust lines displayed.
Thank you! I'm gonna review all the related code in your pull request and this package
Wait, I've found two other bugs in this implementation. I'll check if they are also in the master branch, open issues if they do, and add the fixes to this PR.
Bug 1: Crossbars are repeated Bug 2: the points in the crossbar are not being plotted in order, which causes lines to go back and forth sometimes (apparent when linestyle is not contiguous and/or not fully opaque).
import numpy as np
import matplotlib.pyplot as plt
from pandas import DataFrame, Series
from scikit_posthocs import critical_difference_diagram
n = 10
rng = np.random.default_rng(0)
rank_values = rng.uniform(size=n)
index = np.arange(n).astype(str)
ranks = Series(rank_values, index=index)
sig_array = np.zeros((n, n), dtype=bool)
values = rng.integers(0, 2, size=(n * (n - 1)) // 2, dtype=bool)
triu = np.triu_indices(n, k=1)
sig_array[triu] = sig_array.T[triu] = values
sig_matrix = DataFrame(sig_array, index=index, columns=index)
output = critical_difference_diagram(
ranks, sig_matrix, crossbar_props={"marker": "s", "linestyle": ":", "alpha": .5}
)
plt.savefig("test.png")
@pedroilidio Yeah, thank you! I was about to write that we need as many tests as possible. Can this functionality be covered by them more carefully?
Indeed, @maximtrp, I'll add as many tests for the CDDs as I can :)
I apologize, I was quite wrong here:
Furthermore, I replaced the iterative algorithm to find the crossbar levels: [...]
The maximal cliques approach does not ensure different crossbars at each level. This is more complicated: it's a graph coloring problem in the complement graph of the crossbar graph (on_same_level) (see this SE answer).
I'll revert back to the greedy approach we were using. Another (maybe overkill) option: I could adapt the implementation of NetworkX's approach in a separate module, avoiding to add NetworkX as a new dependency. I could do the same for the maximal cliques function I needed to implement for the CDDs.
It looks better now, but it suspiciously changed from my previous figure (pairs of points are gone). I will write several tests in a further commit.
import numpy as np
import matplotlib.pyplot as plt
from pandas import DataFrame, Series
from scikit_posthocs import critical_difference_diagram
n = 10
rng = np.random.default_rng(0)
rank_values = rng.uniform(size=n)
index = np.arange(n).astype(str)
ranks = Series(rank_values, index=index)
triu = np.triu_indices(n, k=1)
values = rng.integers(0, 2, size=len(triu[0]), dtype=np.int8)
sig_array = np.ones((n, n), dtype=np.int8)
sig_array[triu] = sig_array.T[triu] = values
sig_matrix = DataFrame(sig_array, index=index, columns=index)
plt.figure(figsize=(5, 3))
output = critical_difference_diagram(
ranks, sig_matrix, crossbar_props={"marker": "s", "linestyle": ":", "alpha": .3}
)
plt.tight_layout()
plt.savefig("fixed.png")
Thank you once again! Now not all tests are passed 👇
Thanks for following this thread, @maximtrp! I'll be busier for some days, so I'll get back to working on this at the end of the next week. For now I'll convert the PR to a draft PR until it's ready for merging. Next steps are:
- [ ] Solve failing tests.
- [ ] Add way more unit tests, e.g.:
- Assert ranks are ordered in each crossbar (this fixes #89).
- Assert crossbars do not ever overlap.
- Assert maximal cliques function is properly working (maybe compare several random cases with NetworkX results)
- etc.
- [ ] Decide: for the greedy placement of crossbars in the levels, should we prioritize wider bars, left-most bars, or maybe a mixture of both? (I'm talking specifically about the following line of code) https://github.com/maximtrp/scikit-posthocs/blob/a8d50cb03eda1f97c40e823ac84f30b097933dc8/scikit_posthocs/_plotting.py#L531