checkpoint
checkpoint copied to clipboard
Add parallelization across the codebase.
Is your feature request related to a problem? Please describe.
There are various TODO's all across the codebase where the package can benefit from parallel processing. An example of this can be found here, the similar thing needs to be implemented elsewhere.
Describe the solution you'd like Use the same logic as above to add parallelization.
Additional context We can brainstorm on other possibilities as well.
I created an expression for nested loops using parallel jobs but cannot figure out how this expression can be assigned to a variable then return it afterwards, below is more clear example
Before
for content in contents:
for obj in content:
path = list(obj.keys())[0]
path2content[path] = crypt_obj.encrypt(path)
return path2content
After
Parallel(expression)
which will return a value because of crypt_obj.encrypt(path)
in expression but how this return value could be assigned I can't figure out. Please let me know if this solution can work.
Can I work on this issue?
I created an expression for nested loops using parallel jobs but cannot figure out how this expression can be assigned to a variable then return it afterwards, below is more clear example
Before
for content in contents: for obj in content: path = list(obj.keys())[0] path2content[path] = crypt_obj.encrypt(path) return path2content
After
Parallel(expression)
which will return a value because ofcrypt_obj.encrypt(path)
in expression but how this return value could be assigned I can't figure out. Please let me know if this solution can work.
Hey @rahul-netizen, take a look at this. It is really close to what you want to do. You might have to create a separate function in the same scope and pass it to the delayed
function. See if this makes sense and feel free to implement something completely different. Also, you can work on this issue.
Apologies for starting late.. I tried doing the first todo but I am unable to pass test cases for
checkpoint/tests/test_sequences.py::test_io_sequence FAILED [ 64%] checkpoint/tests/test_sequences.py::test_checkpoint_sequence FAILED [ 71%] checkpoint/tests/test_sequences.py::test_CLI_sequence FAILED
I made these changes
def seq_encrypt_files(self, contents):
"""Encrypt the read files.
Parameters
----------
contents: dict
Dictionary of file paths and their content.
Returns
-------
dict
Dictionary of file paths and their encrypted content.
"""
# TODO: Parallelize this
path2content = {}
crypt_obj = Crypt(key='crypt.key', key_path=os.path.join(
self.root_dir, '.checkpoint'))
# for content in contents:
# for obj in content:
# path = list(obj.keys())[0]
# path2content[path] = crypt_obj.encrypt(path)
# return path2content
def set_path2content(param1):
for obj in param1:
path = list(obj.keys())[0]
path2content[path] = crypt_obj.encrypt(path)
Parallel(self.num_cores)(delayed(set_path2content)(content) for content in contents)
return path2content
Would be really helpful if you could point out what went wrong.
Hey @rahul-netizen, Can you create a pull request? That way I can easily review/test your code locally and suggest some changes.
Hey @rahul-netizen, Can you create a pull request? That way I can easily review/test your code locally and suggest some changes.
Done
Hey Wanna work on this, please assign this to me. Thanx mate