plot method using improper weights for edges given independence tests?
Hello,
Thank you for the bnlearn library for Python!
I have been playing with it for a couple of weeks and found some strange behaviour with the plot function that makes me question if it's a bug.
Here is an example :
1. I have learned the parameters of a DAG and then applied the "g_sq" independence test.
DAG = bn.independence_test(DAG, train_df, test = "g_sq", prune = False)
Printing the output, I get this :
print(DAG["independence_test"])
| source | target | stat_test | p_value | g_sq | dof |
|---|---|---|---|---|---|
| prior_prey_speed_bin | prior_success | True | 0.000000e+00 | 7095.359166 | 16 |
| prior_prey_speed_bin | hunting_type | True | 1.666661e-184 | 879.275521 | 8 |
| prior_success | hunting_type | True | 1.398179e-34 | 179.366911 | 8 |
| xp_level | hunting_type | True | 2.315951e-08 | 41.312300 | 4 |
| xp_level | chase_success_ratio_bin | True | 1.961613e-25 | 135.586098 | 8 |
| hunting_type | chase_success_ratio_bin | True | 1.278954e-26 | 141.290355 | 8 |
| hunting_type | prey_avg_speed_bin | True | 1.087990e-211 | 1005.269860 | 8 |
| hunting_type | hunting_success | True | 1.650949e-16 | 92.213758 | 8 |
| time_lag_bin | hunting_type | True | 1.054652e-10 | 63.280805 | 8 |
| time_lag_bin | chase_success_ratio_bin | True | 3.720814e-10 | 78.056073 | 16 |
| chase_success_ratio_bin | hunting_success | True | 0.000000e+00 | 28650.178197 | 16 |
| prey_avg_speed_bin | chase_success_ratio_bin | True | 0.000000e+00 | 7187.267068 | 16 |
| prey_avg_speed_bin | hunting_success | True | 0.000000e+00 | 7287.686944 | 16 |
| game_duration_bin | hunting_success | True | 0.000000e+00 | 9475.756745 | 16 |
2. Plotting the DAG
3. Problem
Given that the independence test value for this edge is smaller compared to some of the larger ones :
| source | target | stat_test | p_value | g_sq | dof |
|---|---|---|---|---|---|
| hunting_type | hunting_success | True | 1.650949e-16 | 92.213758 | 8 |
then why is the edge returned on the plot much thicker (hunting type --> hunting success)?
Similarly, given that the g_sq test value is very large for this edge :
| source | target | stat_test | p_value | g_sq | dof |
|---|---|---|---|---|---|
| game_duration_bin | hunting_success | True | 0.000000e+00 | 9475.756745 | 16 |
then why is the edge weight is so small on the plot?
4. Attempted solution
Given my suspicions that the plot method does not use the proper weights, I coded my own plot using networkx and matplotlib. Here is my code. It seems that my solution uses the proper weights.
# Get edge properties from the DAG
edge_properties = bn.get_edge_properties(DAG)
# Create a directed graph
G = nx.DiGraph()
# Add edges with weights based on independence test results
for edge, properties in edge_properties.items():
strength = properties.get("weight", 0)
G.add_edge(*edge, weight = strength)
# Manually specify node names
node_mapping = {
"hunting_type": "Hunting\ntype",
"hunting_success": "Hunting\nsuccess",
"prey_avg_speed_bin": "Prey speed",
"prior_prey_speed_bin": "Prior\nprey speed",
"prior_success": "Prior\nhunting\nsuccess",
"chase_success_ratio_bin": "Chase\nsuccess\nratio",
"time_lag_bin": "Time\nlag",
"xp_level": "Experience\nlevel",
"game_duration_bin": "Game\nduration"
}
G = nx.relabel_nodes(G, node_mapping)
# Define positions for the nodes
pos = {
"Experience\nlevel": (0.5, 0),
"Prior\nhunting\nsuccess": (0.8, 0.005),
"Prior\nprey speed": (1.2, 0.005),
"Time\nlag": (0.5, -0.0055),
"Hunting\ntype": (1, 0),
"Prey speed": (1.5, 0),
"Chase\nsuccess\nratio": (1, -0.0055),
"Hunting\nsuccess": (1.25, -0.01),
"Game\nduration": (1, -0.012),
}
# Set a fixed larger size for nodes
node_size = 5000
node_sizes = [node_size for _ in range(len(G))]
# Define edge colors and transparency
M = G.number_of_edges()
edge_colors = [
properties["weight"] for u, v,
properties in G.edges(data = True)
]
edge_alphas = [(5 + i) / (M + 4) for i in range(M)]
# Use viridis colormap
cmap = plt.cm.viridis
# Plot size
plt.figure(figsize = (15, 10))
# Draw the DAG
nodes = nx.draw_networkx_nodes(
G,
pos,
node_size = node_sizes,
node_color = "indigo",
edgecolors = "black"
)
edges = nx.draw_networkx_edges(
G,
pos,
node_size = node_sizes,
arrowstyle = "->",
arrowsize = 20,
edge_color = edge_colors,
edge_cmap = cmap,
width = 2,
)
# set alpha value for each edge
#for i in range(M):
# edges[i].set_alpha(edge_alphas[i])
pc = mpl.collections.PatchCollection(edges, cmap = cmap)
pc.set_array(edge_colors)
# Display node labels
node_labels = nx.draw_networkx_labels(
G, pos,
font_size = 10,
font_weight = "bold",
font_color = "white"
)
# Display the figure
ax = plt.gca()
ax.set_axis_off()
plt.colorbar(pc, ax = ax, label = "Edge strength")
plt.show()
Comparing with the Titanic example
I tried to reproduce my problem using the exact same code provided for the Titanic example, and it seems like this strange behaviour also occurs. Here are my outputs :
import bnlearn as bn
# Load example mixed dataset
df = bn.import_example(data='titanic')
# Convert to onehot
dfhot, dfnum = bn.df2onehot(df)
# Structure learning
# model = bn.structure_learning.fit(dfnum, methodtype='cl', black_list=['Embarked','Parch','Name'], root_node='Survived', bw_list_method='nodes')
model = bn.structure_learning.fit(dfnum)
# Plot
G = bn.plot(model, interactive=False)
# Compute edge strength with the chi_square test statistic
model = bn.independence_test(model, dfnum, test='chi_square', prune=True)
# Plot
bn.plot(model, interactive=False, pos=G['pos'])
# Parameter learning
model = bn.parameter_learning.fit(model, dfnum)
As you can see, the edge between SibSp --> Parch should be much larger, given that the independence test results are :
| source | target | stat_test | p_value | g_sq | dof |
|---|---|---|---|---|---|
| Pclass | Embarked | True | 1.519464e-25 | 129.654853 | 6 |
| Pclass | Survived | True | 4.549252e-23 | 102.888989 | 2 |
| Sex | Survived | True | 1.197357e-58 | 260.717020 | 1 |
| Sex | Pclass | True | 2.063886e-04 | 16.971499 | 2 |
| SibSp | Parch | True | 1.468030e-62 | 336.388883 | 15 |
| Parch | Sex | True | 1.013616e-12 | 58.892256 | 3 |
Any help would be greatly appreciated.
Thank you very much!
Thank you for this clear explanation! Im going to dig into it.
Thank you for your patience. In specific cases, the weights for the edges were indeed not updated. I fixed that issue now.
Update to the latest version with pip install -U bnlearn
Can you confirm whether this also works for you?
Closing this! Let me know if something is still off.