HistoQC
HistoQC copied to clipboard
The first version of unit test for HistoQC
Could be nice to have some coverage report though. A few previous bugs (e.g., the one to parse the image_work_size) occurred in certain branches that are rarely invoked by users - wonder if they could be caught within full-coverage tests.
Could be nice to have some coverage report though. A few previous bugs (e.g., the one to parse the image_work_size) occurred in certain branches that are rarely invoked by users - wonder if they could be caught within full-coverage tests.
this is an interesting idea, i think it should be pretty easy to implement with coverage or pytest-cov plugins. others thoughts?
i love that, and it should be expanded a bit into the main workflow
user should be able to specify a seed as a command line parameter which sets the seed for the entire system, enabling true reproducibility
seed should as well then be saved in the results.tsv file
thoughts? seems fairly straightforward?
From: Jackson Jacobs @.> Sent: Wednesday, October 25, 2023 3:39 PM To: choosehappy/HistoQC @.> Cc: Janowczyk, Andrew R. @.>; Comment @.> Subject: [External] Re: [choosehappy/HistoQC] The first version of unit test for HistoQC (PR #266)
@jacksonjacobs1 commented on this pull request.
In histoqc/tests/test_images_tsv_results.pyhttps://github.com/choosehappy/HistoQC/pull/266#discussion_r1371781298:
content2 = file2.read()
-
# comparing labels
-
for label in self.tsv_labels[:-1]:
-
label_value1 = tu.parseLabel(label, content1)
-
label_value2 = tu.parseLabel(label, content2)
-
self.assertEqual(label_value1, label_value2)
-
print(f'label {label} in tsv results comparison pass')
-
# comparing dataset
-
dataset_label = self.tsv_labels[-1]
-
dataset1 = tu.parseDataset(dataset_label, content1)
-
dataset2 = tu.parseDataset(dataset_label, content2)
-
for field_name in self.tsv_dataset_fields[:10]:
That's correct, there is an assertAlmostEqual() function. However, we don't want to use this broadly as other modules should be exactly reproducible.
One option would be to keep a list of modules which have random behavior, and apply assertAlmostEqual() only to the output from those modules (tsv and mask output).
However, I would argue that all modules with random behavior should also accept seeds to ensure that results are reproducible. The random seed could be passed as a configuration variable. What do you think about that?
— Reply to this email directly, view it on GitHubhttps://github.com/choosehappy/HistoQC/pull/266#discussion_r1371781298, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACJ3XTBOKQVQKP4XQQBG4HTYBEJCNAVCNFSM6AAAAAA6GB3ZL2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTMOJXGM4DAMJQGQ. You are receiving this because you commented.Message ID: @.***>
Agree with @CielAl that a coverage report would be good to have. I'll open an issue to track this feature request.
Regarding the random seed, @choosehappy 's plan makes sense to me. Nan, how long would you expect it would take to add the random seed to this PR? Then your assert statements can be expanded to all output.
Almost there. Can the seed parameter be located at the top of the file to indicate it is applied globally (to all modules)?
Almost there. Can the seed parameter be located at the top of the file to indicate it is applied globally (to all modules)?
Do you mean on top of config files? If you want to put the seed parameter as global variables. Let's put it into ** BasicModule.getBasicStats** and treat the seed same as the image_work_size. Every module can read it from s['seed']. Any thought?
Yes, that makes sense to me.
Can you also run pytest-cov and include the report in the readme?
Looks like there is no such thing as a global config parameter, since each module is only passed the parameters from it's section: https://github.com/choosehappy/HistoQC/blob/3653ffb50a4a17c44fd5daacd357ddd4227e8884/histoqc/_pipeline.py#L365
def load_pipeline(config):
"""load functions and parameters from config
Parameters
----------
config : configparser.ConfigParser
"""
steps = config.get(section='pipeline', option='steps').splitlines()
process_queue = []
for process in steps:
mod_name, func_name = process.split('.')
try:
mod = import_module(f"histoqc.{mod_name}")
except ImportError:
raise NameError(f"Unknown module in pipeline from config file:\t {mod_name}")
func_name = func_name.split(":")[0] # take base of function name
try:
func = getattr(mod, func_name)
except AttributeError:
raise NameError(f"Unknown function from module in pipeline from config file:\t {mod_name}.{func_name}")
if config.has_section(process):
params = dict(config.items(section=process))
else:
params = {}
process_queue.append((func, params))
return process_queue
That's why image_work_size must be specified at the module level.
We can either keep the current setup and specify a seed separately to each section of the config, or add a new section to the config file which gets appended to every params dict during load_pipeline()
The config section would look something like this:
[global]
seed = 123
@choosehappy any comments?
Looks like there is no such thing as a global config parameter, since each module is only passed the parameters from it's section:
https://github.com/choosehappy/HistoQC/blob/3653ffb50a4a17c44fd5daacd357ddd4227e8884/histoqc/_pipeline.py#L365
def load_pipeline(config): """load functions and parameters from config Parameters ---------- config : configparser.ConfigParser """ steps = config.get(section='pipeline', option='steps').splitlines() process_queue = [] for process in steps: mod_name, func_name = process.split('.') try: mod = import_module(f"histoqc.{mod_name}") except ImportError: raise NameError(f"Unknown module in pipeline from config file:\t {mod_name}") func_name = func_name.split(":")[0] # take base of function name try: func = getattr(mod, func_name) except AttributeError: raise NameError(f"Unknown function from module in pipeline from config file:\t {mod_name}.{func_name}") if config.has_section(process): params = dict(config.items(section=process)) else: params = {} process_queue.append((func, params)) return process_queue
That's why image_work_size must be specified at the module level.
We can either keep the current setup and specify a seed separately to each section of the config, or add a new section to the config file which gets appended to every params dict during load_pipeline()
The config section would look something like this:
[global] seed = 123
@choosehappy any comments?
Hi @jacksonjacobs1, I think we can put the seed as global variable into BaseImage since every pipeline has to use it.
[BaseImage.BaseImage]
seed = 123
Ah my mistake. Yes, that should work.
I guess my question is - why do we need to pass this variable around? Once the seed is set at application launch -- all subsequent modules should become deterministic, should they not? its not clear to me why individual modules would need to know the seed - they should just be inheriting from the parent app? at least that's how i understood it works, but could be wrong?
or, is there another use case you folks are thinking about?
I guess my question is - why do we need to pass this variable around? Once the seed is set at application launch -- all subsequent modules should become deterministic, should they not? its not clear to me why individual modules would need to know the seed - they should just be inheriting from the parent app? at least that's how i understood it works, but could be wrong?
or, is there another use case you folks are thinking about?
Got it. We add an optional argument -s / --seed in the command line at the application launch
dividual modules would need to know the seed - they should just be inheriting from the parent app? at least that's how i understood it works, but cou
If I understand it correctly: certain libraries may implement/invoke their own RNG, and therefore in this case if you want the parent app/process to take care of the random state you probably need to centralize the import/seed-setting of all libraries across all modules, but that sorta defeats the purpose of making modules somewhat a "detachable" component that users can customize without modifying the core, e.g., BaseImage.
Therefore, the only difference here between doing it in config or in BaseImage is rather which sementically fulfull the role to carry the seed. I would say that Jack's suggestion might be better since you don't want an overly redundant BaseImage, and technically the program does compile the config first to construct the pipeline (BaseImage + params + shared states of multiprocessing) so it wouldn't be too troublesome to do so this way.
certain libraries may implement/invoke their own RNG,
i think this is to be tested - i remember an issue with pytorch + numpy previously using different RNGs, but eventually they were merged. I'm unsure if we actually have different RNG approaches - almost all of the stuff within histoqc is going to be determnistic . i would think that we could use the tests to verify this, run it a couple of times with the same seed and different seeds and (a) note if/where they are the same different (b) construct (and note down) a list of the modules which are non-deterministic so folks know
in the end, we should be using a default seed, with the option to change it on the command line, so that things are both "automatically" reproducible, with the option to add randomness at the users discretion.
does that make sense?
in the end, we should be using a default seed, with the option to change it on the command line, so that things are both "automatically" reproducible, with the option to add randomness at the users discretion.
Definitely not trying to dispute that but I think it worth stating that, since Modules are executed by worker processes, whether the RNG state is inheritated may depends on how you created the subprocesses. If it is by fork (e.g., UNIX-alike) then the child process inherit the seed naturally, but that won't happen in Windows which uses spawn.
Therefore it means when you execute the pipeline in your worker function, the seed should be either in the
config
or in the shared_dict
in order to set the seed in subprocesses. What I said in previous replies is that, since you need to let the subprocess/Module be aware of the seed, the seed is better in the config/shared_dict rather than in the BaseImage, as when you call the worker function, the config/shared_dict are more relevant arguments.
I asked chatgpt to write a simple draft to prove this (lol):
# run it in Windows
import multiprocessing
import random
import os
def generate_random_number():
# uncomment this line for comparison
# random.seed(42)
print(f"Process ID: {os.getpid()}, Random Number: {random.random()}")
def main():
# Set random seed in the main process
random.seed(42)
print(f"Main Process ID: {os.getpid()}, Random Seed Set: 42")
# Ensure the 'spawn' context is used for creating new processes
multiprocessing.set_start_method('spawn')
# Create and start multiple subprocesses
processes = []
for _ in range(4):
p = multiprocessing.Process(target=generate_random_number)
p.start()
processes.append(p)
# Wait for all subprocesses to finish
for p in processes:
p.join()
if __name__ == "__main__":
main()
in the end, we should be using a default seed, with the option to change it on the command line, so that things are both "automatically" reproducible, with the option to add randomness at the users discretion.
Definitely not trying to dispute that but I think it worth stating that, since Modules are executed by worker processes, whether the RNG state is inheritated may depends on how you created the subprocesses. If it is by fork (e.g., UNIX-alike) then the child process inherit the seed naturally, but that won't happen in Windows which uses spawn.
Therefore it means when you execute the pipeline in your worker function, the seed should be either in the
config
or in theshared_dict
in order to set the seed in subprocesses. What I said in previous replies is that, since you need to let the subprocess/Module be aware of the seed, the seed is better in the config/shared_dict rather than in the BaseImage, as when you call the worker function, the config/shared_dict are more relevant arguments.I asked chatgpt to write a simple draft to prove this (lol):
# run it in Windows import multiprocessing import random import os def generate_random_number(): # uncomment this line for comparison # random.seed(42) print(f"Process ID: {os.getpid()}, Random Number: {random.random()}") def main(): # Set random seed in the main process random.seed(42) print(f"Main Process ID: {os.getpid()}, Random Seed Set: 42") # Ensure the 'spawn' context is used for creating new processes multiprocessing.set_start_method('spawn') # Create and start multiple subprocesses processes = [] for _ in range(4): p = multiprocessing.Process(target=generate_random_number) p.start() processes.append(p) # Wait for all subprocesses to finish for p in processes: p.join() if __name__ == "__main__": main()
PS: if you execute this in Windows, but not setting random seed inside the worker function, you will most likely get different printed random number in each subprocess (very small chance that subprocesses ended up with the same seed without explicit setting).
in the end, we should be using a default seed, with the option to change it on the command line, so that things are both "automatically" reproducible, with the option to add randomness at the users discretion.
Definitely not trying to dispute that but I think it worth stating that, since Modules are executed by worker processes, whether the RNG state is inheritated may depends on how you created the subprocesses. If it is by fork (e.g., UNIX-alike) then the child process inherit the seed naturally, but that won't happen in Windows which uses spawn. Therefore it means when you execute the pipeline in your worker function, the seed should be either in the
config
or in theshared_dict
in order to set the seed in subprocesses. What I said in previous replies is that, since you need to let the subprocess/Module be aware of the seed, the seed is better in the config/shared_dict rather than in the BaseImage, as when you call the worker function, the config/shared_dict are more relevant arguments. I asked chatgpt to write a simple draft to prove this (lol):# run it in Windows import multiprocessing import random import os def generate_random_number(): # uncomment this line for comparison # random.seed(42) print(f"Process ID: {os.getpid()}, Random Number: {random.random()}") def main(): # Set random seed in the main process random.seed(42) print(f"Main Process ID: {os.getpid()}, Random Seed Set: 42") # Ensure the 'spawn' context is used for creating new processes multiprocessing.set_start_method('spawn') # Create and start multiple subprocesses processes = [] for _ in range(4): p = multiprocessing.Process(target=generate_random_number) p.start() processes.append(p) # Wait for all subprocesses to finish for p in processes: p.join() if __name__ == "__main__": main()
PS: if you execute this in Windows, but not setting random seed inside the worker function, you will most likely get different printed random number in each subprocess (very small chance that subprocesses ended up with the same seed without explicit setting).
Hi @CielAl @choosehappy and @jacksonjacobs1, I added the '-s' or '--seed' as optional arguments in command lines. Please review ./histoqc/_work.py and ./histoqc/main.py and provide some feedback. Thanks.
thanks @CielAl , i agree.
i was as well able to confirm that the code provided produces different results in windows when using spawn, meaning special actions will have to be taken here
in line with what you said, i think this is the exact purpose of the mpm, ( multiprocessing.Manager()), made here:
https://github.com/choosehappy/HistoQC/blob/79943ccd8f07e9b9a75460866a1f43a10d616c34/histoqc/main.py#L69C11-L69C36
and that the shared_dict could hold the random seed, to be used to set each worker process internally to ensure reproducibility
@jacksonjacobs1 thoughts?
choosehappy 12/11/2023 5:03 AM the shared_dict could hold the random seed, to be used to set each worker process internally to ensure reproducibility I agree that this seems to be the best solution. I think this is essentially what Nan implemented, no?
https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/main.py#L165 ^ set seed in _shared_state
https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/_worker.py#L40 ^ _shared_state passes the see to the worker.
And the random seed is set at the command line.
Nan, did you get the chance to test this out?
From: choosehappy @.> Sent: Monday, December 11, 2023 5:03 AM To: choosehappy/HistoQC @.> Cc: Jacobs, Jackson @.>; Mention @.> Subject: [External] Re: [choosehappy/HistoQC] The first version of unit test for HistoQC (PR #266)
thanks @CielAlhttps://github.com/CielAl , i agree.
i was as well able to confirm that the code provided produces different results in windows when using spawn, meaning special actions will have to be taken here
in line with what you said, i think this is the exact purpose of the mpm, ( multiprocessing.Manager()), made here:
https://github.com/choosehappy/HistoQC/blob/79943ccd8f07e9b9a75460866a1f43a10d616c34/histoqc/main.py#L69C11-L69C36
and that the shared_dict could hold the random seed, to be used to set each worker process internally to ensure reproducibility
@jacksonjacobs1https://github.com/jacksonjacobs1 thoughts?
— Reply to this email directly, view it on GitHubhttps://github.com/choosehappy/HistoQC/pull/266#issuecomment-1849719858, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQF4IJNOTGDORKMLXOGWLE3YI3K5PAVCNFSM6AAAAAA6GB3ZL2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBZG4YTSOBVHA. You are receiving this because you were mentioned.Message ID: @.***>
I don't see where the seed actually gets set in the worker?
also, shouldn't it be set in the worker initilizer?
https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/_worker.py#L12
would that carry it into the processes indefinitely? -- this actually wouldn't work, because you can imagine if you want something determnistic at a slide level -- if the slide is in spot 1 or spot 10, and you set the seed at time t=0, then you'll be pulling different numbers at the t=1 or t=10 spot, one actually would indeed want to set the seed before processing each individual image
Okay, so we agree that the seed is being passed down far enough (each worker gets the same seed) but isn't actually set anywhere, correct?
I think the appropriate place to set the seed would then be here instead of passing it further into the modules: https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/_worker.py#L40
I would note though that doing so would only guarantee reproducibility if the pipeline is identical between runs. If modules are re-ordered in the pipeline, or new random calls are added to one module, then other modules may behave differently.
Is that acceptable?
Okay, so we agree that the seed is being passed down far enough (each worker gets the same seed) but isn't actually set anywhere, correct?
I think the appropriate place to set the seed would then be here instead of passing it further into the modules: https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/_worker.py#L40
I would note though that doing so would only guarantee reproducibility if the pipeline is identical between runs. If modules are re-ordered in the pipeline, or new random calls are added to one module, then other modules may behave differently.
Is that acceptable?
I think how we expect reproducibility itself answers that: if we are dealing with a new published code of a new research, we expect to "reproduce" the results by setting the seed (assuming using the same kind of hardware/software RNG ofc) and execute their exact scripts. It won't be "reproduce" if you manually shuffle a few exchangeable procedures that involves random number generation. So my opinion is that it is fine.
As for where seeds goes, since shared_state in the current commit already have seeds, it is probably the best to leave it there: if the user-defined Module has anything specific to do with seeds (e.g., a third-party library involking their own RNG) it offers the flexibility. Also this is coherent to how other modules handle multiprocessing related information (e.g., context and locks).
I think the appropriate place to set the seed would then be here instead of passing it further into the modules: https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/_worker.py#L40
do you mean putting it in between line 39 and line 40?
i would probably put it as the first line in the worker function, here on line 21: https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/_worker.py#L21
the reasoning is that ultimately it impacts the working of the entire module, and if someone eventually writes code between 21 and 40 and don't realize that a seed is being set on 40, you end up with unintended consequences. by putting things which impact the entire function at the top, it makes it easier for code maintenance and long term sustainability
I would note though that doing so would only guarantee reproducibility if the pipeline is identical between runs. If modules are re-ordered in the pipeline, or new random calls are added to one module, then other modules may behave differently.
yes, that is correct
one can imagine this in e.g., a clinical setting. they lock down the config file, and then the next 1m WSI will be subjected to that config over the year, but at different times (e.g., 1x per day run over all the WSI). in this case, the results should be directly comparable between runs -- this is the type of reproducibility I'm thinking of.
similarly, i think this concept trickles down into the unit tests, to ensure that the same results are produced all the time for verification purposes?
Hi @choosehappy, @jacksonjacobs1 and @CielAl,
Currently, I moved the seed as optional arguments in command line and the default value is None https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/main.py#L57
As you see, I passed the seed into _shared_state https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/main.py#L165
and worker get the seed and pass it into BaseImage https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/_worker.py#L19 https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/_worker.py#L40
At BaseImage, I save the seed into s["seed"] as a global variable. https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/BaseImage.py#L34
There are 2 modules - ClassificationModule and LocalTextureEstimationModule- use the seed to set the RNG. I let each module handle the setting seed on its own. https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/ClassificationModule.py#L199 https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/LocalTextureEstimationModule.py#L24
I have 3 questions: First, should we give a not-None-value seed as the default such as seed = 123 which ensures that the entire system has the same seed and guarantees reproducibility?
Second, should we give users the ability to turn on and off setting RNG seed? For example, it turns off setting RNG seed if the user gives the seed as 'None' or 'NaN'.
Third, should we set the seed in the worker or leave it for each module to handle on its own?
Hi @choosehappy, @jacksonjacobs1 and @CielAl,
Currently, I moved the seed as optional arguments in command line and the default value is None https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/main.py#L57
As you see, I passed the seed into _shared_state https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/main.py#L165
and worker get the seed and pass it into BaseImage https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/_worker.py#L19 https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/_worker.py#L40
At BaseImage, I save the seed into s["seed"] as a global variable. https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/BaseImage.py#L34
There are 2 modules - ClassificationModule and LocalTextureEstimationModule- use the seed to set the RNG. I let each module handle the setting seed on its own. https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/ClassificationModule.py#L199 https://github.com/nanli-emory/HistoQC/blob/12dca26383b5b826e683027b91d80d880eaf7cee/histoqc/LocalTextureEstimationModule.py#L24
I have 3 questions: First, should we give a not-None-value seed as the default such as seed = 123 which ensures that the entire system has the same seed and guarantees reproducibility?
Second, should we give users the ability to turn on and off setting RNG seed? For example, it turns off setting RNG seed if the user gives the seed as 'None' or 'NaN'.
Third, should we set the seed in the worker or leave it for each module to handle on its own?
Hi @nanli-emory, @choosehappy , and @jacksonjacobs1 ,
I think what Andrew mentioned above, besides where the seed goes, is that the seeds should be set at worker initialization level. Herein we have two scopes that need to take care of: (1) the parent app that manages and invokes the pipeline of modules. (2) modules themselves.
We can take pytorch's multiprocessing DataLoader as an example: for the DataLoader, there are also two parts where random states are involved - One is the creation of the base iterator (link above), their own RNG (torch.Generator class) is initialized (multiprocessing version here), which is plain. The second is in the worker_init_fn, which is recommended but not enforced see here mainly because user can optionally destroy each worker after a round of iteration (persistent_worker=False) and therefore the previously set random state may be lost.
Now if we go back to the HistoQC pipeline, we know three factors:
(1) as Andrew suggested, the seed is an argument set at parent-app-level, meaning that it should overwrite most of whatever random seed settings in both parent app and modules. Therefore it makes sense to simply set the random state using the seeds in shared_dict in worker function here because worker function is invoked when a worker is created and asigned with a job - that will be similar to how pytorch Dataloader set the seed for the iterator of each worker in the multiprocessIter subclasss.
(2) Majority of the default modules use the numpy RNG (perhaps some use the python built-in ones), therefore it is feasible to implement a fix_seed function to set the random states globally at worker-level for all known RNG used in default modules .
(3) It is potentially possible that future user-defined modules (or even our new modules) may use different RNG (e.g., pytorch itself is an example) other than those. This means that the worker-level seeds fixing may not set the state of these modules as we cannot anticipate what RNG they use. Herein, since seed is already passed to the modules in the shared_dict, and modules can directly set the seeds from the shared_dict similar to how they involk locks. ---- this is analogically similar to how pytorch let user to fix their seeds in their own worker_init_fn if they choose to destroy the worker, or that their dataset implementation use different RNGs.
In short, it indeed feels reasonable to set the seeds at worker-level, meaning that the parent-app mostly controls the random states since seed is a parent-app-level argument rather than module level config (also the reason why it should be in shared_dict rather than in configparser). Also if individual user-defined module need more control of the RNG (e.g., thirdparty library with different RNGs), modules can access the seed setting (e.g., whether None/systemdefault or user defined) from the shared_dict.
I personally think while in terms of functionality, putting seeds as global variable of BaseImage has no difference to just leave it in shared_dict, the former might not be an ideal solution as essentially you can put everything into BaseImage since it is used by all modules --- but we definitely don't want to do that due to all the coupling and redundancy level - so why don't we just leave the seed in shared_dict as it's more worker-relevant and used to pass the worker states to modules.
Revisiting this issue now.
I agree that the seed should be set in the _shared_state dict.
@nanli-emory Is there anything that still needs to be done here or is this branch ready to be merged in?
Revisiting this issue now.
I agree that the seed should be set in the _shared_state dict.
@nanli-emory Is there anything that still needs to be done here or is this branch ready to be merged in? It's done. Please merge it.