clustershell icon indicating copy to clipboard operation
clustershell copied to clipboard

RangeSet: nD folding optimization (#485)

Open thiell opened this issue 2 years ago • 1 comments

Optimize multi-dimensional range set folding algorithm:

  • get rid of heuristic when pre-expanding vectors for folding to improve reproducibility
  • when merging vectors, perform O(n) "folding" passes as long as changes are detected to reduce the problem size, before attempting a final O(n^2) pass
  • avoid casting string into integer when sorting (as RangeSet is now using strings natively)

Closes #485

thiell avatar Aug 31 '22 07:08 thiell

$ time (cluset -e x[2000-2073]c[0-7]s[0-7]b[0-1] | python3 -m cProfile -s cumulative lib/ClusterShell/CLI/Nodeset.py  -f)
x[2000-2073]c[0-7]s[0-7]b[0-1]
         4773441 function calls (4276373 primitive calls) in 5.067 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    107/1    0.009    0.000    5.169    5.169 {built-in method exec}
        1    0.000    0.000    5.169    5.169 Nodeset.py:27(<module>)
        1    0.001    0.001    5.046    5.046 Nodeset.py:334(main)
        1    0.000    0.000    5.045    5.045 Nodeset.py:155(nodeset)
        1    0.285    0.285    5.042    5.042 Nodeset.py:44(process_stdin)
    18950    0.031    0.000    3.148    0.000 RangeSet.py:884(inner)
590373/94356    0.058    0.000    2.538    0.000 {built-in method len}
     9473    0.008    0.000    2.537    0.000 RangeSet.py:1147(_fold)
        3    0.000    0.000    2.527    0.842 NodeSet.py:238(__len__)
        3    0.000    0.000    2.527    0.842 RangeSet.py:926(__len__)
        1    0.000    0.000    2.459    2.459 RangeSet.py:1180(_fold_multivariate)
    18947    0.023    0.000    2.104    0.000 NodeSet.py:1508(update)
     9474    0.041    0.000    1.873    0.000 NodeSet.py:1202(__init__)
        1    0.199    0.199    1.827    1.827 RangeSet.py:1194(_fold_multivariate_merge)
    18947    0.031    0.000    1.418    0.000 NodeSet.py:789(parse)
     9472    0.064    0.000    1.375    0.000 NodeSet.py:810(parse_string)
        6    0.000    0.000    1.074    0.179 RangeSet.py:1126(_sort)
        6    0.083    0.014    1.074    0.179 {method 'sort' of 'list' objects}
    28419    0.045    0.000    1.067    0.000 NodeSet.py:539(update)
    37889    0.085    0.000    1.043    0.000 NodeSet.py:490(_add)
    14876    0.057    0.000    0.991    0.000 RangeSet.py:1128(rgveckeyfunc)
    74380    0.128    0.000    0.899    0.000 RangeSet.py:1136(<genexpr>)
   148807    0.231    0.000    0.826    0.000 RangeSet.py:533(copy)
    18944    0.047    0.000    0.770    0.000 NodeSet.py:996(_scan_string)
   119008    0.154    0.000    0.765    0.000 RangeSet.py:478(__getitem__)
   224583    0.507    0.000    0.737    0.000 RangeSet.py:106(__init__)
   156896    0.243    0.000    0.705    0.000 RangeSet.py:262(_sorted)
        1    0.015    0.015    0.632    0.632 RangeSet.py:1188(_fold_multivariate_expand)
     9472    0.195    0.000    0.608    0.000 NodeSet.py:962(_scan_string_single)
    75776    0.164    0.000    0.603    0.000 RangeSet.py:194(fromone)
    18946    0.042    0.000    0.581    0.000 RangeSet.py:898(copy)
    18946    0.020    0.000    0.500    0.000 RangeSet.py:904(<listcomp>)```

thiell avatar Aug 31 '22 07:08 thiell

We've been running with this patch for more than 2 months. Are there any blockers to it being merged?

mattaezell avatar Nov 07 '22 00:11 mattaezell