unitxt
unitxt copied to clipboard
Performance Suggestion: Replace dict1.keys() & dict2.keys() with set(dict1).intersection(dict2) for Improved Speed and Memory Efficiency
https://github.com/IBM/unitxt/blob/231fd293f53baac6be94133c606d7ddaf66eacd4/src/unitxt/artifact.py#L50
I’d like to propose a minor optimization related to key intersection operations between two dictionaries.
Original Code:
keys_in_both = dict1.keys() & dict2.keys()
Suggested Replacement:
keys_in_both = set(dict1).intersection(dict2)
Although both approaches return the same result — the intersection of dictionary keys — the performance characteristics differ under the hood:
- dict1.keys() returns a dict_keys view, which supports set operations like &, but these require the interpreter to internally convert the view to a temporary set to complete the operation.
- set(dict1).intersection(dict2) explicitly creates a set and calls the optimized intersection method, which is implemented in highly efficient C code.