SimpleHypergraphs.jl
SimpleHypergraphs.jl copied to clipboard
Accelerate to calcuration modularity
SimpleHypergraphs.jl version v1.6.0
https://github.com/pszufe/SimpleHypergraphs.jl/blob/1b03ab32977b96ff0be8553b82e596fcd5aaef56/src/modularity.jl#L62
This line takes time. I succeeded acceleration. I confirm this program works correctly and fast some hypergraphs and partitions.
mr = 0
v = 0
for i in 1:nhv(h) v += length(gethyperedges(h, i)) end
dict = Dict([])
for i in 1:nhe(h)
dict[length(getvertices(h, i))] = get(dict, length(getvertices(h, i)), 0) + 1
end
va = Dict([i => 0.0 for i in 1:length(partition)])
for (i, cluster) in enumerate(partition)
for node in cluster va[i] += length(gethyperedges(h, node)) end
va[i] /= v
end
for (key, val) in dict
tmp = 0
for (key2, val2) in va
tmp += (val2 ^ key)
end
tmp *= val
mr += tmp
end
# (sum(eP) - sum( ha.Ed[d]*sum(volP_volV.^d) for d in 1:ha.max_hes)) / nhe(h)
(sum(eP) - mr) / nhe(h)
However, I can't description why acceleration... (Originally, I was improving other line.) So you can ignore and close this issue. Please refer this program if you like. And sorry my poor English. Thank you.
Hello thank you for the feedback.
I tried to run your code. In my tests (see below) your code takes twice as long to compute. Could you perhaps update the test below with an example illustrating a case where your code is actually faster? Additionally, in case of degenerate hypergraphs, yields wrong results (but this should be fairly easy to repair if the performance is good).
using SimpleHypergraphs
using Random
using StatsBase
using BenchmarkTools
Random.seed!(0)
m = sample([nothing, true], Weights([0.8,0.2]),(30,30))
h = Hypergraph(m)
part = SimpleHypergraphs.randompartition(h,5)
ha = SimpleHypergraphs.HypergraphAggs(h)
function modularity2(h::Hypergraph, partition::Vector{Set{Int}},
ha::SimpleHypergraphs.HypergraphAggs=SimpleHypergraphs.HypergraphAggs(h))
@boundscheck sum(length.(partition)) == nhv(h)
@boundscheck union(partition...) == Set(1:nhv(h))
volP_volV = [sum(ha.deg_vs[i] for i in p)/ha.volV for p in partition]
eP = [count(i-> ha.hes[i]>0 && (keys(h.he2v[i]) ⊆ p), 1:nhe(h)) for p in partition]
mr = 0
v = 0
for i in 1:nhv(h) v += length(gethyperedges(h, i)) end
dict = Dict([])
for i in 1:nhe(h)
dict[length(getvertices(h, i))] = get(dict, length(getvertices(h, i)), 0) + 1
end
va = Dict([i => 0.0 for i in 1:length(partition)])
for (i, cluster) in enumerate(partition)
for node in cluster va[i] += length(gethyperedges(h, node)) end
va[i] /= v
end
for (key, val) in dict
tmp = 0
for (key2, val2) in va
tmp += (val2 ^ key)
end
tmp *= val
mr += tmp
end
# (sum(eP) - sum( ha.Ed[d]*sum(volP_volV.^d) for d in 1:ha.max_hes)) / nhe(h)
(sum(eP) - mr) / nhe(h)
end
And now the tests:
julia> modu1 = @btime modularity($h,$part,$ha);
12.700 μs (211 allocations: 8.27 KiB)
julia> modu2 = @btime modularity2($h,$part,$ha);
21.500 μs (377 allocations: 11.29 KiB)
julia> modu1 ≈ modu2
true
First, generate hypergraph. It has clearly some clusters.
function create_hypergraph(npcs, hepcs, he_rate=0.5, noise=10)
node_num = 0
he_num = 0
clusters = length(npcs)
for npc in npcs
node_num += npc
end
for hepc in hepcs
he_num += hepc
end
h = Hypergraph{Int64}(node_num, he_num)
step = 1
fin_node = 1
for (npc, hepc) in zip(npcs, hepcs)
for i in 1:hepc
# he_size = rand(1:npc)
he = randsubseq(fin_node:fin_node+npc-1, he_rate)
he = [(i in he ? 1 : nothing) for i in 1:nhv(h)]
for (j, v) in enumerate(he)
h[j, step] = v
end
step += 1
end
fin_node += npc
end
for i in 1:noise
he = randsubseq(1:node_num, 0.5)
he = Dict([i => 1 for i in he])
add_hyperedge!(h, vertices=he)
end
h
end
h = create_hypergraph([100 for i in 1:5], [20 for i in 1:5], 0.5, 4)
rp = randompartition(h, 30)
-
npcs
isnode per clusters
. This is array. -
hepcs
ishyperedge per clusters
. This is array. - Arbitrary node is included hyperedges with
he_rate
probability. - Generated hypergraph has
noise
hyperedges. There are not fully included Arbitrary cluster.
Generated like this incidence matrix.
julia> mod1 = @btime modularity(h, rp)
589.906 μs (3526 allocations: 194.89 KiB)
-1.5186397093946605e-53
julia> mod2 = @btime modularity2(h, rp)
222.981 μs (5718 allocations: 152.38 KiB)
-1.518639709394661e-53
And accelerate when using findcommunities
.
- Using conventional modularity.
julia> cfm = CFModularityCNMLike(1000)
CFModularityCNMLike(1000)
julia> @benchmark findcommunities(h, cfm)
BenchmarkTools.Trial:
memory estimate: 335.32 MiB
allocs estimate: 6643143
--------------
minimum time: 1.214 s (0.00% GC)
median time: 1.781 s (3.26% GC)
mean time: 1.911 s (3.10% GC)
maximum time: 2.739 s (4.38% GC)
--------------
samples: 3
evals/sample: 1
- Using my modularity
julia> @benchmark findcommunities(h, cfm)
BenchmarkTools.Trial:
memory estimate: 265.13 MiB
allocs estimate: 10089341
--------------
minimum time: 585.823 ms (0.00% GC)
median time: 761.730 ms (7.58% GC)
mean time: 976.228 ms (5.85% GC)
maximum time: 1.728 s (6.59% GC)
--------------
samples: 6
evals/sample: 1
However, I understand that not everything is better.