matplotlib-venn
matplotlib-venn copied to clipboard
Design a new layout algorithm
... along with a general system for pluggable layouts. This would address most of issues being reported here regularly (#30, #34).
The main problem with current layout is that it sometimes results in circles positioned so that some of the 7 regions, corresponding to subsets simply is not present. The better layout should make sure each of the regions exists (even if it would misrepresent some of the other general patterns).
There are several approaches for implementing such logic:
- Applying the current layout, checking whether all the subset regions are present, increasing the weight of the missing regions and repeating until convergence.
- Formulating the "loss function" of the layout so that it would account both for quality of representation and non-zeroeness of the set areas, and finding the layout using
scipy.optimize
.
Internally the layout would probably be represented as a function, which, given subset sizes, returns the radii and centers of the circles. Externally the choice of the layout would be specified by the keyword parameter layout
passed to the venn3/venn2
function. The value would be either None (default), string ('naive', 'optim', ...) or the function itself.
Hi, Paul here, you may remember me as the author of matplotlib_venn_wordcloud (issue #29). I had a lazy afternoon and implemented the second approach, i.e. using scipy.optimize
. In its current form, it does quite well for "canonical" examples but still fails badly for the edge cases in issues #30 and #34.
Specifically, the cost function likely needs some work. At the moment, it uses the absolute difference, which doesn't work well if some of the areas are much smaller than the others. The options for minimize
could also use some experimentation, as I simply re-used some code I had floating around for a vaguely similar task. For these reasons, the code is obviously not PR-ready; however, I am unsure if and when I will have time again, so I thought I better dump it here in case you or anyone else want to pursue this approach further. Licensed under the MIT license or whichever license this repository may adopt in the future.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from shapely.geometry import Point
DEBUG = True
def get_radii(a, b, c, ab, bc, ac, abc):
areas = np.array([
a + ab + ac + abc,
b + ab + bc + abc,
c + ac + bc + abc,
])
return np.sqrt(areas / np.pi)
def get_subset_areas(origins, radii):
geometries = get_subset_geometries(origins, radii)
return [geometry.area for geometry in geometries]
def get_subset_geometries(origins, radii):
a, b, c = [Point(*origin).buffer(radius) for origin, radius in zip(origins, radii)]
return [
a.difference(b.union(c)), # A
b.difference(a.union(c)), # B
c.difference(b.union(c)), # C
a.intersection(b).difference(c), # AB
b.intersection(c).difference(a), # BC
a.intersection(c).difference(b), # AC
a.intersection(b).intersection(c), # ABC
]
def get_point_on_a_circle(origin, radius, angle):
"""Compute the (x, y) coordinate of a point at a specified angle
on a circle given by its (x, y) origin and radius."""
x0, y0 = origin
x = x0 + radius * np.cos(angle)
y = y0 + radius * np.sin(angle)
return np.array([x, y])
def solve_venn3_circles(desired_areas):
# [A, B, C, AB, BC, AC, ABC]
desired_areas = np.array(desired_areas)
radii = get_radii(*desired_areas)
def cost_function(origins):
current_areas = get_subset_areas(origins.reshape(-1, 2), radii)
cost = np.abs(current_areas - desired_areas) # absolute difference; probably not the best choice
return np.sum(cost)
# initialize origins such that all rings overlap
initial_origins = np.array(
[get_point_on_a_circle(np.zeros((2)), np.min(radii), angle) for angle in np.pi * np.array([0, 2/3, 4/3])]
)
result = minimize(
cost_function,
initial_origins.flatten(),
method='SLSQP',
jac='2-point',
options=dict(ftol=1e-3, disp=DEBUG),
)
if not result.success:
print("Warning: could not compute circle positions for given subsets.")
print(f"scipy.optimize.minimize: {result.message}.")
origins = result.x.reshape((-1, 2))
if DEBUG:
print("Desired areas:", desired_areas)
print("Returned areas:", get_subset_areas(origins, radii))
return origins, radii
if __name__ == "__main__":
def test(A, B, C, AB, BC, AC, ABC, ax=None):
origins, radii = solve_venn3_circles([A, B, C, AB, BC, AC, ABC])
if ax is None:
fig, ax = plt.subplots()
for origin, radius in zip(origins, radii):
ax.add_patch(plt.Circle(origin, radius, alpha=0.33))
ax.set_aspect("equal")
ax.autoscale_view()
ax.axis("off")
ax.set_title(f"A={A}, B={B}, C={C},\nAB={AB}, BC={BC}, AC={AC}, ABC={ABC}")
fig, axes = plt.subplots(2, 3, figsize=(15,10))
axes = axes.ravel()
test(1, 1, 1, 0.5, 0.5, 0.5, 0.25, axes[0]) # canonical example
test(3, 2, 1, 0.5, 0.4, 0.3, 0.2, axes[1]) # different areas
test(1, 1, 1, 0, 0, 0, 0, axes[2]) # no intersections; not bad, but could be handled better
test(1, 0, 0, 1, 0, 0, 1, axes[3]) # a is superset to b, b is superset to c
test(2, 2, 2, 0.5, 0, 0.5, 0, axes[4]) # no BC, and no ABC
test(2, 2, 2, 0.1, 0.1, 0.1, 0, axes[5]) # no ABC
# issue #34
test(167, 7, 25, 41, 174, 171, 51)
# issue #30
test(1, 0, 0, 32, 0, 76, 13)
test(1, 0, 0, 650, 0, 76, 13)
test(7549417, 15685620, 26018311, 5128906, 301048, 6841264, 2762301)
plt.show()
Had an idea for a small improvement on my bike ride home yesterday. This version transforms the desired and actual areas using the relationship y = log(x + 1)
before computing the cost as the absolute difference between the (transformed) desired and the (transformed) actual areas. The transformation has the following properties: (1) it is monotonically increasing, i.e. it preserves the order of sizes, (2) it preserves small numbers as x ~ log(x + 1) for small x, but (3) it strongly compresses the range of large numbers. These properties combined let small areas carry a bit more weight in the optimization, resulting in better solutions, at least by eye.
I also improved the layout of the debug info and made some additional cosmetic changes.
import numpy as np
from scipy.optimize import minimize
from shapely.geometry import Point
DEBUG = True
def get_radii(a, b, c, ab, bc, ac, abc):
areas = np.array([
a + ab + ac + abc,
b + ab + bc + abc,
c + ac + bc + abc,
])
return np.sqrt(areas / np.pi)
def get_subset_areas(origins, radii):
geometries = get_subset_geometries(origins, radii)
return np.array([geometry.area for geometry in geometries])
def get_subset_geometries(origins, radii):
a, b, c = [Point(*origin).buffer(radius) for origin, radius in zip(origins, radii)]
return [
a.difference(b.union(c)), # A
b.difference(a.union(c)), # B
c.difference(b.union(c)), # C
a.intersection(b).difference(c), # AB
b.intersection(c).difference(a), # BC
a.intersection(c).difference(b), # AC
a.intersection(b).intersection(c), # ABC
]
def get_point_on_a_circle(origin, radius, angle):
"""Compute the (x, y) coordinate of a point at a specified angle
on a circle given by its (x, y) origin and radius."""
x0, y0 = origin
x = x0 + radius * np.cos(angle)
y = y0 + radius * np.sin(angle)
return np.array([x, y])
def solve_venn3_circles(desired_areas):
# [A, B, C, AB, BC, AC, ABC]
desired_areas = np.array(desired_areas)
radii = get_radii(*desired_areas)
def cost_function(origins):
current_areas = get_subset_areas(origins.reshape(-1, 2), radii)
# # Option 1: absolute difference
# # Probably not the best choice, as small areas are often practically ignored.
# cost = np.abs(current_areas - desired_areas)
# # Option 2: relative difference
# # This often results in the optimization routine failing.
# minimum_area = 1e-2
# desired_areas[desired_areas < minimum_area] = minimum_area
# cost = np.abs(current_areas - desired_areas) / desired_areas
# Option 3: absolute difference of log(area + 1)
# This transformation is monotonic increasing but strongly compresses large numbers,
# thus allowing small numbers to carry more weight in the optimization.
cost = np.abs(np.log(current_areas + 1) - np.log(desired_areas + 1))
return np.sum(cost)
# initialize origins such that all rings overlap
initial_origins = np.array(
[get_point_on_a_circle(np.zeros((2)), np.min(radii), angle) for angle in np.pi * np.array([0, 2/3, 4/3])]
)
result = minimize(
cost_function,
initial_origins.flatten(),
method='SLSQP',
options=dict(disp=DEBUG),
)
if not result.success:
print("Warning: could not compute circle positions for given subsets.")
print(f"scipy.optimize.minimize: {result.message}.")
origins = result.x.reshape((-1, 2))
if DEBUG:
import pandas as pd
data = pd.DataFrame(dict(desired=desired_areas, actual=get_subset_areas(origins, radii)))
data["absolute difference"] = np.abs(data["desired"] - data["actual"])
data["relative difference"] = data["absolute difference"] / data["desired"]
with pd.option_context('display.float_format', '{:0.1f}'.format):
print(data)
return origins, radii
if __name__ == "__main__":
import matplotlib.pyplot as plt
def test(A, B, C, AB, BC, AC, ABC, ax=None):
origins, radii = solve_venn3_circles([A, B, C, AB, BC, AC, ABC])
if ax is None:
fig, ax = plt.subplots()
for origin, radius in zip(origins, radii):
ax.add_patch(plt.Circle(origin, radius, alpha=0.33))
ax.set_aspect("equal")
ax.autoscale_view()
ax.axis("off")
ax.set_title(f"A={A}, B={B}, C={C},\nAB={AB}, BC={BC}, AC={AC}, ABC={ABC}")
fig, axes = plt.subplots(2, 3, figsize=(15,10))
axes = axes.ravel()
test(1, 1, 1, 0.5, 0.5, 0.5, 0.25, axes[0]) # canonical example
test(3, 2, 1, 0.5, 0.4, 0.3, 0.2, axes[1]) # different areas
test(1, 1, 1, 0, 0, 0, 0, axes[2]) # no intersections; not bad, but could be handled better
test(1, 0, 0, 1, 0, 0, 1, axes[3]) # a is superset to b, b is superset to c
test(2, 2, 2, 0.5, 0, 0.5, 0, axes[4]) # no BC, and no ABC
test(2, 2, 2, 0.1, 0.1, 0.1, 0, axes[5]) # no ABC
# issue #34
test(167, 7, 25, 41, 174, 171, 51)
# issue #30
test(1, 0, 0, 32, 0, 76, 13)
test(1, 0, 0, 650, 0, 76, 13)
test(7549417, 15685620, 26018311, 5128906, 301048, 6841264, 2762301)
plt.show()
Wow, this is very cool.
Let me try to find time to take it up on the weekend. Besides just the layout implementation this will require:
- A major version increase.
- A careful change to the API to allow plugging the new layout algorithm without disrupting any existing usages.
- A packaging option where Shapely is an opt-in dependency along with clear enough documentation.
Do have a close look at the code though, too. Four eyes see more than two. For example, I just found a major bug in get_subset_geometries
. Also, making a proper test suite with a bunch of examples that evaluates performance along a few metrics (absolute difference, relative difference, etc) would be a good call, I think. Note, that I haven't compared my results to the output of your algorithm at all, yet.
Without the bug, the performance of my optimization is much better. ;-) Issue #30, last test case, gets solved perfectly.
import numpy as np
from scipy.optimize import minimize
from shapely.geometry import Point
DEBUG = True
def get_radii(a, b, c, ab, bc, ac, abc):
areas = np.array([
a + ab + ac + abc,
b + ab + bc + abc,
c + ac + bc + abc,
])
return np.sqrt(areas / np.pi)
def get_subset_areas(origins, radii):
geometries = get_subset_geometries(origins, radii)
return np.array([geometry.area for geometry in geometries])
def get_subset_geometries(origins, radii):
a, b, c = [get_shapely_circle(origin, radius) for origin, radius in zip(origins, radii)]
return [
a.difference(b).difference(c), # A
b.difference(a).difference(c), # B
c.difference(a).difference(b), # C
a.intersection(b).difference(c), # AB
b.intersection(c).difference(a), # BC
a.intersection(c).difference(b), # AC
a.intersection(b).intersection(c), # ABC
]
def get_shapely_circle(origin, radius):
return Point(*origin).buffer(radius)
def get_point_on_a_circle(origin, radius, angle):
"""Compute the (x, y) coordinate of a point at a specified angle
on a circle given by its (x, y) origin and radius."""
x0, y0 = origin
x = x0 + radius * np.cos(angle)
y = y0 + radius * np.sin(angle)
return np.array([x, y])
def solve_venn3_circles(desired_areas):
# [A, B, C, AB, BC, AC, ABC]
desired_areas = np.array(desired_areas)
radii = get_radii(*desired_areas)
def cost_function(origins):
current_areas = get_subset_areas(origins.reshape(-1, 2), radii)
# # Option 1: absolute difference
# # Probably not the best choice, as small areas are often practically ignored.
# cost = np.abs(current_areas - desired_areas)
# Option 2: relative difference
# This often results in the optimization routine failing.
# minimum_area = 1e-2
# desired_areas[desired_areas < minimum_area] = minimum_area
# cost = np.abs(current_areas - desired_areas) / desired_areas
# Option 3: absolute difference of log(area + 1)
# This transformation is monotonic increasing but strongly compresses large numbers,
# thus allowing small numbers to carry more weight in the optimization.
cost = np.abs(np.log(current_areas + 1) - np.log(desired_areas + 1))
return np.sum(cost)
# initialize origins such that all rings overlap
initial_origins = np.array(
[get_point_on_a_circle(np.zeros((2)), np.min(radii), angle) for angle in np.pi * np.array([0, 2/3, 4/3])]
)
result = minimize(
cost_function,
initial_origins.flatten(),
method='SLSQP',
options=dict(disp=DEBUG, ftol=1e-6),
)
if not result.success:
print("Warning: could not compute circle positions for given subsets.")
print(f"scipy.optimize.minimize: {result.message}.")
origins = result.x.reshape((-1, 2))
if DEBUG:
import pandas as pd
data = pd.DataFrame(dict(desired=desired_areas, actual=get_subset_areas(origins, radii)))
data["absolute difference"] = np.abs(data["desired"] - data["actual"])
data["relative difference"] = data["absolute difference"] / data["desired"]
with pd.option_context('display.float_format', '{:0.1f}'.format):
print(data)
return origins, radii
if __name__ == "__main__":
import matplotlib.pyplot as plt
def test(A, B, C, AB, BC, AC, ABC, ax=None):
origins, radii = solve_venn3_circles([A, B, C, AB, BC, AC, ABC])
if ax is None:
fig, ax = plt.subplots()
for origin, radius in zip(origins, radii):
ax.add_patch(plt.Circle(origin, radius, alpha=0.33))
ax.set_aspect("equal")
ax.autoscale_view()
ax.axis("off")
ax.set_title(f"A={A}, B={B}, C={C},\nAB={AB}, BC={BC}, AC={AC}, ABC={ABC}")
fig, axes = plt.subplots(2, 3, figsize=(15,10))
axes = axes.ravel()
test(1, 1, 1, 0.5, 0.5, 0.5, 0.25, axes[0]) # canonical example
test(3, 2, 1, 0.5, 0.4, 0.3, 0.2, axes[1]) # different areas
test(1, 1, 1, 0, 0, 0, 0, axes[2]) # no intersections; not bad, but could be handled better
test(1, 0, 0, 1, 0, 0, 1, axes[3]) # a is superset to b, b is superset to c
test(2, 2, 2, 0.5, 0, 0.5, 0, axes[4]) # no BC, and no ABC
test(2, 2, 2, 0.1, 0.1, 0.1, 0, axes[5]) # no ABC
# issue #30
test(1, 0, 0, 32, 0, 76, 13)
test(1, 0, 0, 650, 0, 76, 13)
test(7549417, 15685620, 26018311, 5128906, 301048, 6841264, 2762301) # works well with latest version
# issue #34
test(167, 7, 25, 41, 174, 171, 51)
plt.show()
Obviously I won't just copy-paste your code somewhere :)
Regarding dependencies:
A packaging option where Shapely and Scipy are opt-in dependencies along with clear enough documentation.
There is no way around scipy.optimize
, but shapely
is only used to compute the subset areas. I assumed that you were doing that somewhere within your code base, but I couldn't find it within a hot minute, so I decided to roll my own using shapely
(I did mention I was feeling lazy yesterday). Even if there isn't anything in your code base, yet, computing circle areas sounds like something that has closed form solutions that should be easily implemented with a bit of numpy arithmetic.
Also, eventually, it might be worth posting a question on SO, possibly with a bounty (happy to chip in), to see if anyone can find a better a cost function or minimization procedure. It's the right sort of problem (self-contained yet of practical use) that might attract the right people.
I added some constraints to the optimisation, which did make the results more visually pleasing. I also played around with the initialisation, albeit with no success.
import numpy as np
from scipy.spatial.distance import cdist, pdist, squareform
from scipy.optimize import minimize, NonlinearConstraint
from shapely.geometry import Point
DEBUG = True
def get_radii(a, b, c, ab, bc, ac, abc):
areas = np.array([
a + ab + ac + abc,
b + ab + bc + abc,
c + ac + bc + abc,
])
return np.sqrt(areas / np.pi)
def get_subset_areas(origins, radii):
geometries = get_subset_geometries(origins, radii)
return np.array([geometry.area for geometry in geometries])
def get_subset_geometries(origins, radii):
a, b, c = [get_shapely_circle(origin, radius) for origin, radius in zip(origins, radii)]
return [
a.difference(b).difference(c), # A
b.difference(a).difference(c), # B
c.difference(a).difference(b), # C
a.intersection(b).difference(c), # AB
b.intersection(c).difference(a), # BC
a.intersection(c).difference(b), # AC
a.intersection(b).intersection(c), # ABC
]
def get_shapely_circle(origin, radius):
return Point(*origin).buffer(radius)
def initialize_origins(radii):
"""Initialize origins on a small circle around (0, 0)."""
origin = np.zeros((2))
radius = np.min(radii)
angles = np.pi * np.array([0, 2/3, 4/3])
return np.array(
[get_point_on_a_circle(origin, radius, angle) for angle in angles]
)
def get_point_on_a_circle(origin, radius, angle):
"""Compute the (x, y) coordinate of a point at a specified angle
on a circle given by its (x, y) origin and radius."""
x0, y0 = origin
x = x0 + radius * np.cos(angle)
y = y0 + radius * np.sin(angle)
return np.array([x, y])
# # Doesn't work as well as I had hoped.
# def initialize_origins(radii):
# """Initialize circle origins such that all circles (pairwise) overlap.
# The optimization uses gradient descent. If certain subset areas
# are zero, there is no gradient to follow. This procedure
# guarantees the existence of all subsections apart from region ABC.
# However, in all examples tested, ABC did indeed exist as well.
# """
# ra, rb, rc = radii
# # choose the side lengths such that circles overlap, but not fully
# ab = ra + rb - min(ra, rb) / 2
# bc = rb + rc - min(rb, rc) / 2
# ac = ra + rc - min(ra, rc) / 2
# # find corner coordinates
# # https://math.stackexchange.com/a/1989113/439191
# a = np.array([0, 0])
# b = np.array([0, ab])
# cy = (ab**2 + ac**2 - bc**2) / (2 * ab)
# cx = np.sqrt(ac**2 - cy**2)
# c = np.array([cx, cy])
# return np.array([a, b, c])
def solve_venn3_circles(desired_areas):
# [A, B, C, AB, BC, AC, ABC]
desired_areas = np.array(desired_areas)
radii = get_radii(*desired_areas)
def cost_function(origins):
current_areas = get_subset_areas(origins.reshape(-1, 2), radii)
# # Option 1: absolute difference
# # Probably not the best choice, as small areas are often practically ignored.
# cost = np.abs(current_areas - desired_areas)
# # Option 2: relative difference
# # This often results in the optimization routine failing.
# minimum_area = 1e-2
# desired_areas[desired_areas < minimum_area] = minimum_area
# cost = np.abs(current_areas - desired_areas) / desired_areas
# Option 3: absolute difference of log(area + 1)
# This transformation is monotonic increasing but strongly compresses large numbers,
# thus allowing small numbers to carry more weight in the optimization.
cost = np.abs(np.log(current_areas + 1) - np.log(desired_areas + 1))
return np.sum(cost)
# constraints:
eps = np.min(radii) * 0.01
lower_bounds = np.abs(radii[np.newaxis, :] - radii[:, np.newaxis]) - eps
lower_bounds[lower_bounds < 0] = 0
lower_bounds = squareform(lower_bounds)
upper_bounds = radii[np.newaxis, :] + radii[:, np.newaxis] + eps
upper_bounds -= np.diag(np.diag(upper_bounds)) # squareform requires zeros on diagonal
upper_bounds = squareform(upper_bounds)
def constraint_function(origins):
origins = np.reshape(origins, (-1, 2))
return pdist(origins)
distance_between_origins = NonlinearConstraint(
constraint_function, lb=lower_bounds, ub=upper_bounds)
result = minimize(
cost_function,
initialize_origins(radii).flatten(),
method='SLSQP',# 'COBYLA',
constraints=[distance_between_origins],
options=dict(disp=DEBUG)
)
if not result.success:
print("Warning: could not compute circle positions for given subsets.")
print(f"scipy.optimize.minimize: {result.message}.")
origins = result.x.reshape((-1, 2))
if DEBUG:
import pandas as pd
data = pd.DataFrame(dict(desired=desired_areas, actual=get_subset_areas(origins, radii)))
data["absolute difference"] = np.abs(data["desired"] - data["actual"])
data["relative difference"] = data["absolute difference"] / data["desired"]
with pd.option_context('display.float_format', '{:0.1f}'.format):
print(data)
return origins, radii
if __name__ == "__main__":
import matplotlib.pyplot as plt
def test(A, B, C, AB, BC, AC, ABC, ax=None):
origins, radii = solve_venn3_circles([A, B, C, AB, BC, AC, ABC])
if ax is None:
fig, ax = plt.subplots()
for origin, radius in zip(origins, radii):
ax.add_patch(plt.Circle(origin, radius, alpha=0.33))
ax.set_aspect("equal")
ax.autoscale_view()
ax.axis("off")
ax.set_title(f"A={A}, B={B}, C={C},\nAB={AB}, BC={BC}, AC={AC}, ABC={ABC}")
fig, axes = plt.subplots(2, 3, figsize=(15,10))
axes = axes.ravel()
test(1, 1, 1, 0.5, 0.5, 0.5, 0.25, axes[0]) # canonical example
test(3, 2, 1, 0.5, 0.4, 0.3, 0.2, axes[1]) # different areas
test(1, 1, 1, 0, 0, 0, 0, axes[2]) # no intersections; not bad, but could be handled better
test(1, 0, 0, 1, 0, 0, 1, axes[3]) # a is superset to b, b is superset to c
test(2, 2, 2, 0.5, 0, 0.5, 0, axes[4]) # no BC, and no ABC
test(2, 2, 2, 0.1, 0.1, 0.1, 0, axes[5]) # no ABC
# # issue #30
# test(1, 0, 0, 32, 0, 76, 13)
# test(1, 0, 0, 650, 0, 76, 13)
# test(7549417, 15685620, 26018311, 5128906, 301048, 6841264, 2762301) # works well with latest version
# # issue #34
# test(167, 7, 25, 41, 174, 171, 51)
plt.show()
An argument in favour of making shapely
a core dependency would be that is has an implementation of polylabel
, which can be used to find the point of inaccessibility. I think that would improve the label placement.
Taking the example from issue #50 :
import numpy as np
from scipy.spatial.distance import cdist, pdist, squareform
from scipy.optimize import minimize, NonlinearConstraint
from shapely.geometry import Point
from shapely.ops import polylabel
DEBUG = True
def get_radii(a, b, c, ab, bc, ac, abc):
areas = np.array([
a + ab + ac + abc,
b + ab + bc + abc,
c + ac + bc + abc,
])
return np.sqrt(areas / np.pi)
def get_subset_areas(origins, radii):
geometries = get_subset_geometries(origins, radii)
return np.array([geometry.area for geometry in geometries])
def get_subset_geometries(origins, radii):
a, b, c = [get_shapely_circle(origin, radius) for origin, radius in zip(origins, radii)]
return [
a.difference(b).difference(c), # A
b.difference(a).difference(c), # B
c.difference(a).difference(b), # C
a.intersection(b).difference(c), # AB
b.intersection(c).difference(a), # BC
a.intersection(c).difference(b), # AC
a.intersection(b).intersection(c), # ABC
]
def get_shapely_circle(origin, radius):
return Point(*origin).buffer(radius)
def initialize_origins(radii):
"""Initialize origins on a small circle around (0, 0)."""
origin = np.zeros((2))
radius = np.min(radii)
angles = np.pi * np.array([0, 2/3, 4/3])
return np.array(
[get_point_on_a_circle(origin, radius, angle) for angle in angles]
)
def get_point_on_a_circle(origin, radius, angle):
"""Compute the (x, y) coordinate of a point at a specified angle
on a circle given by its (x, y) origin and radius."""
x0, y0 = origin
x = x0 + radius * np.cos(angle)
y = y0 + radius * np.sin(angle)
return np.array([x, y])
# # Doesn't work as well as I had hoped.
# def initialize_origins(radii):
# """Initialize circle origins such that all circles (pairwise) overlap.
# The optimization uses gradient descent. If certain subset areas
# are zero, there is no gradient to follow. This procedure
# guarantees the existence of all subsections apart from region ABC.
# However, in all examples tested, ABC did indeed exist as well.
# """
# ra, rb, rc = radii
# # choose the side lengths such that circles overlap, but not fully
# ab = ra + rb - min(ra, rb) / 2
# bc = rb + rc - min(rb, rc) / 2
# ac = ra + rc - min(ra, rc) / 2
# # find corner coordinates
# # https://math.stackexchange.com/a/1989113/439191
# a = np.array([0, 0])
# b = np.array([0, ab])
# cy = (ab**2 + ac**2 - bc**2) / (2 * ab)
# cx = np.sqrt(ac**2 - cy**2)
# c = np.array([cx, cy])
# return np.array([a, b, c])
def solve_venn3_circles(desired_areas):
# [A, B, C, AB, BC, AC, ABC]
desired_areas = np.array(desired_areas)
radii = get_radii(*desired_areas)
def cost_function(origins):
current_areas = get_subset_areas(origins.reshape(-1, 2), radii)
# # Option 1: absolute difference
# # Probably not the best choice, as small areas are often practically ignored.
# cost = np.abs(current_areas - desired_areas)
# # Option 2: relative difference
# # This often results in the optimization routine failing.
# minimum_area = 1e-2
# desired_areas[desired_areas < minimum_area] = minimum_area
# cost = np.abs(current_areas - desired_areas) / desired_areas
# Option 3: absolute difference of log(area + 1)
# This transformation is monotonic increasing but strongly compresses large numbers,
# thus allowing small numbers to carry more weight in the optimization.
cost = np.abs(np.log(current_areas + 1) - np.log(desired_areas + 1))
return np.sum(cost)
# constraints:
eps = np.min(radii) * 0.01
lower_bounds = np.abs(radii[np.newaxis, :] - radii[:, np.newaxis]) - eps
lower_bounds[lower_bounds < 0] = 0
lower_bounds = squareform(lower_bounds)
upper_bounds = radii[np.newaxis, :] + radii[:, np.newaxis] + eps
upper_bounds -= np.diag(np.diag(upper_bounds)) # squareform requires zeros on diagonal
upper_bounds = squareform(upper_bounds)
def constraint_function(origins):
origins = np.reshape(origins, (-1, 2))
return pdist(origins)
distance_between_origins = NonlinearConstraint(
constraint_function, lb=lower_bounds, ub=upper_bounds)
result = minimize(
cost_function,
initialize_origins(radii).flatten(),
method='SLSQP',# 'COBYLA',
constraints=[distance_between_origins],
options=dict(disp=DEBUG)
)
if not result.success:
print("Warning: could not compute circle positions for given subsets.")
print(f"scipy.optimize.minimize: {result.message}.")
origins = result.x.reshape((-1, 2))
if DEBUG:
import pandas as pd
data = pd.DataFrame(dict(desired=desired_areas, actual=get_subset_areas(origins, radii)))
data["absolute difference"] = np.abs(data["desired"] - data["actual"])
data["relative difference"] = data["absolute difference"] / data["desired"]
with pd.option_context('display.float_format', '{:0.1f}'.format):
print(data)
return origins, radii
def get_label_positions(origins, radii, labels):
"For each non-zero area, find the point of inaccesibility."
geometries = get_subset_geometries(origins, radii)
output = list()
for ii, (label, geometry) in enumerate(zip(labels, geometries)):
if geometry.area > 0:
poi = polylabel(geometry)
output.append((poi.x, poi.y, label))
return output
if __name__ == "__main__":
import matplotlib.pyplot as plt
def test(A, B, C, AB, BC, AC, ABC, ax=None):
origins, radii = solve_venn3_circles([A, B, C, AB, BC, AC, ABC])
if ax is None:
fig, ax = plt.subplots()
for origin, radius in zip(origins, radii):
ax.add_patch(plt.Circle(origin, radius, alpha=0.33))
ax.set_aspect("equal")
ax.autoscale_view()
ax.axis("off")
# ax.set_title(f"A={A}, B={B}, C={C},\nAB={AB}, BC={BC}, AC={AC}, ABC={ABC}")
label_positions = get_label_positions(origins, radii, [A, B, C, AB, BC, AC, ABC])
for (x, y, label) in label_positions:
ax.text(x, y, label, va="center", ha="center")
# fig, axes = plt.subplots(2, 3, figsize=(15,10))
# axes = axes.ravel()
# test(1, 1, 1, 0.5, 0.5, 0.5, 0.25, axes[0]) # canonical example
# test(3, 2, 1, 0.5, 0.4, 0.3, 0.2, axes[1]) # different areas
# test(1, 1, 1, 0, 0, 0, 0, axes[2]) # no intersections; not bad, but could be handled better
# test(1, 0, 0, 1, 0, 0, 1, axes[3]) # a is superset to b, b is superset to c
# test(2, 2, 2, 0.5, 0, 0.5, 0, axes[4]) # no BC, and no ABC
# test(2, 2, 2, 0.1, 0.1, 0.1, 0, axes[5]) # no ABC
# # issue #30
# test(1, 0, 0, 32, 0, 76, 13)
# test(1, 0, 0, 650, 0, 76, 13)
# test(7549417, 15685620, 26018311, 5128906, 301048, 6841264, 2762301) # works well with latest version
# # issue #34
# test(167, 7, 25, 41, 174, 171, 51)
# issue # 50
test(1808, 1181, 1858, 4715, 3031, 26482, 65012)
plt.show()
So, I haven't found enough time this weekend to look at the particular algorithm proposals above. What I did manage, though, is to refactor the code and expose the layout algorithm for customization.
After the latest commit it is possible to do the following:
- Create a custom
VennLayoutAlgorithm
implementation. - Pass it as
layout_algorithm=custom_algorithm
tovenn2/venn3/venn2_circles/venn3_circles
.
Could you try packaging the best of your experiments above into such a class to make it easy to try out? The current architecture lends reasonably well to inclusion of such experimental layout methods (perhaps under an appropriately-named package) into the codebase without breaking any normal usages.
Sorry, I had a cold and am behind on work so I won't have time for additional contributions right now. Also, I think it is pretty important that you run some tests first, to see if this version performs better than your original implementation. Until that is for certain, it doesn't really make sense to spend more time on this code.
Furthermore, if it does indeed perform better, you may want to think about some further refactoring. The code should pretty easily generalize to arbitrary Venn diagrams (2, 3, 4, etc), so you may want to make more significant changes to the API.
Well, we're not in a hurry here, and as you might notice I myself also find less time for this project than it might deserve :)
I will certainly keep looking at it as I find the time gradually. Just made sure there are interfaces where you could also contribute constructively if you happen to get the time.
pretty easily generalize to arbitrary Venn diagrams (2, 3, 4, etc), so you may want to make more significant changes to the API.
Yes, I saw that and I do think that it would be nice to have a vennx
function for this sort of arbitrariness. The only change the API might need is to use Dict[code, size] as the primary representation of region sizes rather than the current Tuple[size, size, size, ...].