clustershell
clustershell copied to clipboard
Performance on large many-dimensional nodesets
HPE Cray EX supercomputers use hardware locations (called xnames) that encode up to 5 dimensions. An example compute node might be x1000c2s3b0n1
. We don't use xnames for our compute nodes, but our switches, bmcs, chassis controllers, etc do use them. Some of our local tooling (and clush/cluset) struggle with long lists of xnames, particularly when folding.
[[email protected] ~]# time (cluset -e x[2000-2073]c[0-7]s[0-7]b[0-1] | cluset -f)
x[2000-2073]c[0-7]s[0-7]b[0-1]
real 6m38.483s
user 6m38.228s
sys 0m0.088s
Any thoughts on how we could improve this? Thanks
Thanks for the report @mattaezell.
A quick look (on the master branch) shows that most of the time is spent in RangeSetND._fold_multivariate_merge()
which does the nD folding:
$ cluset -e x[2000-2073]c[0-7]s[0-7]b[0-1] | python3 -m cProfile -s cumulative lib/ClusterShell/CLI/Nodeset.py -f
x[2000-2073]c[0-7]s[0-7]b[0-1]
1225113449 function calls (1073087573 primitive calls) in 848.288 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
107/1 0.009 0.000 848.388 848.388 {built-in method exec}
1 0.000 0.000 848.388 848.388 Nodeset.py:27(<module>)
1 0.001 0.001 848.258 848.258 Nodeset.py:334(main)
1 0.000 0.000 848.257 848.257 Nodeset.py:155(nodeset)
1 0.251 0.251 848.255 848.255 Nodeset.py:44(process_stdin)
18950 0.028 0.000 846.483 0.045 RangeSet.py:884(inner)
152119181/94356 14.296 0.000 845.921 0.009 {built-in method len}
9473 0.008 0.000 845.920 0.089 RangeSet.py:1147(_fold)
3 0.000 0.000 845.911 281.970 NodeSet.py:238(__len__)
3 0.000 0.000 845.911 281.970 RangeSet.py:926(__len__)
1 0.000 0.000 845.856 845.856 RangeSet.py:1180(_fold_multivariate)
1 186.997 186.997 844.685 844.685 RangeSet.py:1245(_fold_multivariate_merge) <<<
71670778 86.067 0.000 333.553 0.000 RangeSet.py:533(copy)
47701583 31.013 0.000 331.814 0.000 RangeSet.py:582(__and__)
47701583 33.374 0.000 287.534 0.000 RangeSet.py:591(intersection)
71746554 115.193 0.000 176.797 0.000 RangeSet.py:106(__init__)
23855527 18.540 0.000 173.677 0.000 RangeSet.py:564(__or__)
23855527 16.714 0.000 147.369 0.000 RangeSet.py:573(union)
51414608 57.401 0.000 144.083 0.000 RangeSet.py:542(__eq__)
414319969 110.208 0.000 110.208 0.000 {built-in method isinstance}
95526306 67.220 0.000 91.474 0.000 RangeSet.py:736(update)
50787459 38.299 0.000 72.265 0.000 RangeSet.py:652(issubset)
52033619 17.782 0.000 34.702 0.000 RangeSet.py:676(_binary_sanity_check)
47701583 31.294 0.000 31.294 0.000 RangeSet.py:703(intersection_update)
71784438 19.991 0.000 19.991 0.000 RangeSet.py:244(set_autostep)
623080 0.775 0.000 2.092 0.000 RangeSet.py:670(__gt__)
18947 0.021 0.000 1.969 0.000 NodeSet.py:1508(update)
9474 0.041 0.000 1.760 0.000 NodeSet.py:1202(__init__)
18947 0.031 0.000 1.339 0.000 NodeSet.py:789(parse)
9472 0.061 0.000 1.296 0.000 NodeSet.py:810(parse_string)
28419 0.043 0.000 0.982 0.000 NodeSet.py:539(update)
37889 0.074 0.000 0.959 0.000 NodeSet.py:490(_add)
623080 0.481 0.000 0.809 0.000 RangeSet.py:657(issuperset)
18944 0.042 0.000 0.735 0.000 NodeSet.py:996(_scan_string)
2 0.000 0.000 0.640 0.320 RangeSet.py:1126(_sort)
2 0.061 0.031 0.640 0.320 {method 'sort' of 'list' objects}
9472 0.190 0.000 0.584 0.000 NodeSet.py:962(_scan_string_single)
9473 0.033 0.000 0.579 0.000 RangeSet.py:1128(rgveckeyfunc)
18946 0.039 0.000 0.534 0.000 RangeSet.py:898(copy)
1 0.011 0.011 0.531 0.531 RangeSet.py:1190(_fold_multivariate_expand)
75776 0.154 0.000 0.526 0.000 RangeSet.py:194(fromone)
We'll investigate.
Thanks for the super-quick patch for this. Testing seems to work MUCH better.
[[email protected] ~]# time (cluset -e x[2000-2073]c[0-7]s[0-7]b[0-1] | cluset -f)
x[2000-2073]c[0-7]s[0-7]b[0-1]
real 0m2.849s
user 0m2.956s
sys 0m0.084s