data icon indicating copy to clipboard operation
data copied to clipboard

[DataLoader] Deep copy ReadingService during DL2 initialization

Open NivekT opened this issue 2 years ago • 10 comments

Stack from ghstack:

  • #728
  • -> #746

By creating a deep copy of the input ReadingService during the initialization of DataLoader2, we allow the instance of ReadingService to store state without worrying about the users passing the same ReadingService to initialize additional DataLoaders.

For instance, PrototypeMultiProcessingReadingService stores processes and datapipes as instance variables. If a single object of that ReadingService is used to initialize multiple DataLoaders, it can led to conflicts and incorrect states. This should no longer be an issue with this change.

Relevant test are added in subsequent PR related to lengths and eventually PrototypeMultiprocessingReadingService.

Differential Revision: D38868731

NivekT avatar Aug 18 '22 22:08 NivekT

@nivekt has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

NivekT avatar Aug 19 '22 17:08 NivekT

We should put this information into the ReadingService interface file too. To make sure that RS developers know about potential deep copy.

VitalyFedyunin avatar Aug 19 '22 17:08 VitalyFedyunin

One more though, we deepcopy RS, but we pickle-unpickle DP. I think we need to pick consistent approach.

VitalyFedyunin avatar Aug 19 '22 17:08 VitalyFedyunin

One more though, we deepcopy RS, but we pickle-unpickle DP. I think we need to pick consistent approach.

Totally agree. And, I support deepcopy because it's more readable and even working with unpickable objects.

ejguan avatar Aug 19 '22 17:08 ejguan

One more though, we deepcopy RS, but we pickle-unpickle DP. I think we need to pick consistent approach.

Can you elaborate on this? What make a consistent approach matter here?

NivekT avatar Aug 22 '22 22:08 NivekT

Also, it seems like some internal ReadingService cannot be deep-copied. One alternative is for users to pass in a constructor instead of the object itself, but that will change the interface quite a bit.

NivekT avatar Aug 22 '22 22:08 NivekT

Also, it seems like some internal ReadingService cannot be deep-copied. One alternative is for users to pass in a constructor instead of the object itself, but that will change the interface quite a bit.

It seems the Error comes from construction of process group in __init__ function. You might be able to move them into initialize function to make it working.

ejguan avatar Aug 23 '22 13:08 ejguan

@nivekt has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

NivekT avatar Aug 24 '22 22:08 NivekT

@nivekt has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

NivekT avatar Aug 25 '22 17:08 NivekT

@nivekt has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

NivekT avatar Aug 25 '22 20:08 NivekT

@nivekt has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

NivekT avatar Oct 18 '22 22:10 NivekT