cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

resolvers: reduce the impact of to_snake_case

Open oliver-sanders opened this issue 1 year ago • 2 comments

The rather innocent to_snake_case method (imported from 3rd party library) chews quite a lot of CPU in some cases.

It seems to be getting called a vast number of times, suggestions (order of preference):

  • Refactor to reduce the number of calls.
  • Cache the calls to reduce CPU impact.
  • Improve the upstream implementation by re.compile'ing the regex in the module scope to avoid doing this with each call. Note we aren't using the latest version of this library, so we would need to re-implement this locally in until we've upgraded.

oliver-sanders avatar Jan 04 '24 17:01 oliver-sanders

to_snake_case gets called on every field, it essentially translates between python variable names and JS names (i.e. taskProxy -> task_proxy).. only way to get rid of it would be to change the data-store to use JS names, but there may still be some translation kicked to somewhere .. But could be done (I think).

Would have to be a 8.x.0 release, as changing the protobuf var names will break the api.

dwsutherland avatar Feb 07 '24 01:02 dwsutherland

only way to get rid of it would be to change the data-store to use JS names

The to_snake_case command is being called many, many more times than we have field names to resolve.

This suggests that it is being called for the same field names over and over again, which from a skim of the code makes sense. So we don't need to "get rid" of this, only avoid duplicate calls. The best way to achieve this is to avoid the situation with a code refactor. The easiest way to avoid this is to wrap the to_snake_case function with functools.lru_cache.

oliver-sanders avatar Feb 07 '24 10:02 oliver-sanders