unitxt icon indicating copy to clipboard operation
unitxt copied to clipboard

Performance Suggestion: Replace dict1.keys() & dict2.keys() with set(dict1).intersection(dict2) for Improved Speed and Memory Efficiency

Open SaFE-APIOpt opened this issue 8 months ago • 0 comments

https://github.com/IBM/unitxt/blob/231fd293f53baac6be94133c606d7ddaf66eacd4/src/unitxt/artifact.py#L50 I’d like to propose a minor optimization related to key intersection operations between two dictionaries. Original Code: keys_in_both = dict1.keys() & dict2.keys() Suggested Replacement: keys_in_both = set(dict1).intersection(dict2) Although both approaches return the same result — the intersection of dictionary keys — the performance characteristics differ under the hood:

  • dict1.keys() returns a dict_keys view, which supports set operations like &, but these require the interpreter to internally convert the view to a temporary set to complete the operation.
  • set(dict1).intersection(dict2) explicitly creates a set and calls the optimized intersection method, which is implemented in highly efficient C code.

SaFE-APIOpt avatar Apr 25 '25 02:04 SaFE-APIOpt