checkpoint icon indicating copy to clipboard operation
checkpoint copied to clipboard

Add parallelization across the codebase.

Open antrikshmisri opened this issue 3 years ago • 7 comments

Is your feature request related to a problem? Please describe.

There are various TODO's all across the codebase where the package can benefit from parallel processing. An example of this can be found here, the similar thing needs to be implemented elsewhere.

Describe the solution you'd like Use the same logic as above to add parallelization.

Additional context We can brainstorm on other possibilities as well.

antrikshmisri avatar Dec 01 '21 12:12 antrikshmisri

I created an expression for nested loops using parallel jobs but cannot figure out how this expression can be assigned to a variable then return it afterwards, below is more clear example

Before

for content in contents:
            for obj in content:
                path = list(obj.keys())[0]
                path2content[path] = crypt_obj.encrypt(path)

        return path2content

After Parallel(expression) which will return a value because of crypt_obj.encrypt(path) in expression but how this return value could be assigned I can't figure out. Please let me know if this solution can work.

rahul-netizen avatar Dec 02 '21 19:12 rahul-netizen

Can I work on this issue?

rahul-netizen avatar Dec 11 '21 14:12 rahul-netizen

I created an expression for nested loops using parallel jobs but cannot figure out how this expression can be assigned to a variable then return it afterwards, below is more clear example

Before

for content in contents:
            for obj in content:
                path = list(obj.keys())[0]
                path2content[path] = crypt_obj.encrypt(path)

        return path2content

After Parallel(expression) which will return a value because of crypt_obj.encrypt(path) in expression but how this return value could be assigned I can't figure out. Please let me know if this solution can work.

Hey @rahul-netizen, take a look at this. It is really close to what you want to do. You might have to create a separate function in the same scope and pass it to the delayed function. See if this makes sense and feel free to implement something completely different. Also, you can work on this issue.

antrikshmisri avatar Dec 13 '21 10:12 antrikshmisri

Apologies for starting late.. I tried doing the first todo but I am unable to pass test cases for

checkpoint/tests/test_sequences.py::test_io_sequence FAILED [ 64%] checkpoint/tests/test_sequences.py::test_checkpoint_sequence FAILED [ 71%] checkpoint/tests/test_sequences.py::test_CLI_sequence FAILED

I made these changes

def seq_encrypt_files(self, contents):
        """Encrypt the read files.

        Parameters
        ----------
        contents: dict
            Dictionary of file paths and their content.

        Returns
        -------
        dict
            Dictionary of file paths and their encrypted content.
        """
        # TODO: Parallelize this
        path2content = {}
        crypt_obj = Crypt(key='crypt.key', key_path=os.path.join(
            self.root_dir, '.checkpoint'))

        # for content in contents:
        #     for obj in content:
        #         path = list(obj.keys())[0]
        #         path2content[path] = crypt_obj.encrypt(path)

        # return path2content


        def set_path2content(param1):
            for obj in param1:
                path = list(obj.keys())[0]
                path2content[path] = crypt_obj.encrypt(path)

        Parallel(self.num_cores)(delayed(set_path2content)(content) for content in contents)

        return path2content

Would be really helpful if you could point out what went wrong.

rahul-netizen avatar Jan 09 '22 13:01 rahul-netizen

Hey @rahul-netizen, Can you create a pull request? That way I can easily review/test your code locally and suggest some changes.

antrikshmisri avatar Jan 11 '22 09:01 antrikshmisri

Hey @rahul-netizen, Can you create a pull request? That way I can easily review/test your code locally and suggest some changes.

Done

rahul-netizen avatar Jan 12 '22 12:01 rahul-netizen

Hey Wanna work on this, please assign this to me. Thanx mate

raaghavrm avatar May 21 '22 16:05 raaghavrm