maestrowf icon indicating copy to clipboard operation
maestrowf copied to clipboard

WIP: Feature/links dev merge

Open crkrenn opened this issue 4 years ago • 30 comments

@FrankD412 & @jsemler,

This is a beta version of a user defined linking system.

It needs more documentation and tests, but I'd like your initial impressions. And, I also need to add some additional logging. There are also some minor differences (e.g. 'apply dag' vs. 'apply environment') that I will resolve before removing the "WIP" label.

A question: should I keep the new code in utils.py, or should I move it to a new module? Would it count as an "interface" (e.g. interfaces/linker.py)?

And, more jinja templates could be easily added (e.g. 'time', 'username', etc.).

Some miscellaneous thoughts:

  • I think I should add an automatic suffix if there are link conflict
  • I could consider a step index as well as a maestro execution index, but I don't really want to encourage that kind of naming.
  • And, I could write a tool that would process a set of maestro runs and create links after the simulations are run.

There are 3 new maestro run options:

  --make-links          Automatically make customizable, human-readable 
                        links to run directories. [Default: False]
  --link-directory LINK_DIRECTORY
                        Jinja template for path where links to run directories are made
                        [Default: {{output_root}}/links]
  --link-template LINK_TEMPLATE
                        Jinja template for links to run directories
                        [Default: {{link_directory}}/{{date}}/run-{{INDEX}}/{{instance}}/{{step}}]
                         
                        Currently supported Jinja variables:
                        {{output_root}} - Parent directory for this maestro study
                        {{link_directory}} - Link directory for this maestro study
                        {{date}} - Human-readable date (e.g. '2020_07_28')
                        {{instance}} - Maestro label for a set of parameters
                                       (e.g. 'X1.5.X2.5.X3.20')
                                       [maximum length: 255 characters]
                        {{step}} - Maestro label for a given step (e.g. 'run')
                        {{INDEX}} - Unique number for each maestro execution (e.g. '0001')

A simple example of links that are automatically created is below:

(venv) ➜  /tmp tree sample_output                                                                                                   
sample_output
├── hello_bye_world_20201015-124125
│   ├── batch.info
│   ├── bye_world
│   │   ├── bye.txt
│   │   ├── bye_world.13912.err
│   │   ├── bye_world.13912.out
│   │   └── bye_world.sh
│   ├── hello_bye_parameterized.yaml
│   ├── hello_bye_world.pkl
│   ├── hello_bye_world.study.pkl
│   ├── hello_bye_world.txt
│   ├── hello_world
│   │   ├── GREETING.Ciao.NAME.Jim
│   │   │   ├── Ciao_Jim.txt
│   │   │   ├── hello_world_GREETING.Ciao.NAME.Jim.13909.err
│   │   │   ├── hello_world_GREETING.Ciao.NAME.Jim.13909.out
│   │   │   └── hello_world_GREETING.Ciao.NAME.Jim.sh
│   │   └── GREETING.Hello.NAME.Pam
│   │       ├── Hello_Pam.txt
│   │       ├── hello_world_GREETING.Hello.NAME.Pam.13908.err
│   │       ├── hello_world_GREETING.Hello.NAME.Pam.13908.out
│   │       └── hello_world_GREETING.Hello.NAME.Pam.sh
│   ├── logs
│   │   └── hello_bye_world.log
│   ├── maestro_index_file
│   ├── meta
│   │   ├── environment.yaml
│   │   ├── metadata.yaml
│   │   ├── parameters.yaml
│   │   └── study
│   │       └── env.pkl
│   ├── status.csv
│   └── whats_up_world
│       ├── NAME.Jim
│       │   ├── whats_up.txt
│       │   ├── whats_up_world_NAME.Jim.13907.err
│       │   ├── whats_up_world_NAME.Jim.13907.out
│       │   └── whats_up_world_NAME.Jim.sh
│       └── NAME.Pam
│           ├── whats_up.txt
│           ├── whats_up_world_NAME.Pam.13906.err
│           ├── whats_up_world_NAME.Pam.13906.out
│           └── whats_up_world_NAME.Pam.sh
├── hello_bye_world_20201015-124137
│   ├── batch.info
│   ├── bye_world
│   │   ├── bye.txt
│   │   ├── bye_world.13978.err
│   │   ├── bye_world.13978.out
│   │   └── bye_world.sh
│   ├── hello_bye_parameterized.yaml
│   ├── hello_bye_world.pkl
│   ├── hello_bye_world.study.pkl
│   ├── hello_bye_world.txt
│   ├── hello_world
│   │   ├── GREETING.Ciao.NAME.Jim
│   │   │   ├── Ciao_Jim.txt
│   │   │   ├── hello_world_GREETING.Ciao.NAME.Jim.13966.err
│   │   │   ├── hello_world_GREETING.Ciao.NAME.Jim.13966.out
│   │   │   └── hello_world_GREETING.Ciao.NAME.Jim.sh
│   │   └── GREETING.Hello.NAME.Pam
│   │       ├── Hello_Pam.txt
│   │       ├── hello_world_GREETING.Hello.NAME.Pam.13965.err
│   │       ├── hello_world_GREETING.Hello.NAME.Pam.13965.out
│   │       └── hello_world_GREETING.Hello.NAME.Pam.sh
│   ├── logs
│   │   └── hello_bye_world.log
│   ├── maestro_index_file
│   ├── meta
│   │   ├── environment.yaml
│   │   ├── metadata.yaml
│   │   ├── parameters.yaml
│   │   └── study
│   │       └── env.pkl
│   ├── status.csv
│   └── whats_up_world
│       ├── NAME.Jim
│       │   ├── whats_up.txt
│       │   ├── whats_up_world_NAME.Jim.13964.err
│       │   ├── whats_up_world_NAME.Jim.13964.out
│       │   └── whats_up_world_NAME.Jim.sh
│       └── NAME.Pam
│           ├── whats_up.txt
│           ├── whats_up_world_NAME.Pam.13963.err
│           ├── whats_up_world_NAME.Pam.13963.out
│           └── whats_up_world_NAME.Pam.sh
└── links
    └── 2020-10-15
        ├── run-0001
        │   ├── GREETING.Ciao.NAME.Jim
        │   │   └── hello_world -> /private/tmp/sample_output/hello_bye_world_20201015-124125/hello_world/GREETING.Ciao.NAME.Jim
        │   ├── GREETING.Hello.NAME.Pam
        │   │   └── hello_world -> /private/tmp/sample_output/hello_bye_world_20201015-124125/hello_world/GREETING.Hello.NAME.Pam
        │   ├── NAME.Jim
        │   │   └── whats_up_world -> /private/tmp/sample_output/hello_bye_world_20201015-124125/whats_up_world/NAME.Jim
        │   ├── NAME.Pam
        │   │   └── whats_up_world -> /private/tmp/sample_output/hello_bye_world_20201015-124125/whats_up_world/NAME.Pam
        │   └── all_records
        │       └── bye_world -> /private/tmp/sample_output/hello_bye_world_20201015-124125/bye_world
        └── run-0002
            ├── GREETING.Ciao.NAME.Jim
            │   └── hello_world -> /private/tmp/sample_output/hello_bye_world_20201015-124137/hello_world/GREETING.Ciao.NAME.Jim
            ├── GREETING.Hello.NAME.Pam
            │   └── hello_world -> /private/tmp/sample_output/hello_bye_world_20201015-124137/hello_world/GREETING.Hello.NAME.Pam
            ├── NAME.Jim
            │   └── whats_up_world -> /private/tmp/sample_output/hello_bye_world_20201015-124137/whats_up_world/NAME.Jim
            ├── NAME.Pam
            │   └── whats_up_world -> /private/tmp/sample_output/hello_bye_world_20201015-124137/whats_up_world/NAME.Pam
            └── all_records
                └── bye_world -> /private/tmp/sample_output/hello_bye_world_20201015-124137/bye_world

crkrenn avatar Oct 15 '20 19:10 crkrenn

@crkrenn -- Sorry for taking so long to get to this PR. I'm looking at testing it this morning.

FrankD412 avatar Nov 09 '20 15:11 FrankD412

Alright -- after some review, this appears to be functioning on my end. I tested the LULESH example locally. One thing that I did notice is that parameters are repeated if they're in combination with other values. For examples:

(maestrowf-JqadW55J-py3.7)  ~/Documents/Code/Python/maestrowf/sample_output/lulesh   feature/links_dev_merge  ll links/2020-11-09/run-0001
total 0
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:08 ITER.10.SIZE.10
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:08 ITER.10.SIZE.20
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:08 ITER.10.SIZE.30
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:08 ITER.20.SIZE.10
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:08 ITER.20.SIZE.20
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:08 ITER.20.SIZE.30
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:08 ITER.30.SIZE.10
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:08 ITER.30.SIZE.20
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:08 ITER.30.SIZE.30
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:09 SIZE.10
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:09 SIZE.20
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:09 SIZE.30
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:09 TRIAL.1
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:09 TRIAL.2
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:09 TRIAL.3
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:09 TRIAL.4
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:09 TRIAL.5
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:09 TRIAL.6
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:09 TRIAL.7
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:09 TRIAL.8
drwxr-xr-x@ 3 frank  staff    96B Nov  9 11:09 TRIAL.9
drwxr-xr-x@ 4 frank  staff   128B Nov  9 11:09 all_records

This structure isn't a tree so you see that SIZE appears multiple times. Is that intended? It's perfectly fine if this is an iterative step too. I just want to verify that this is working as you intended, @crkrenn.

FrankD412 avatar Nov 09 '20 19:11 FrankD412

Hello Frank,

Thanks for taking a look!

Could you post the commands you used to generate the commands above? Your output may or may not be what I intended.

Thanks!

-Chris

thecivicscenter avatar Nov 10 '20 19:11 thecivicscenter

@crkrenn @thecivicscenter -- I generated the above links from the root of the Maestro repository by running the following:

maestro run --make-links ./samples/lulesh/lulesh_sample1_macosx.yaml

That creates timestamped directories in the ./sample_output/lulesh/lulesh_sample1* with a links directory at ./sample_output/lulesh/links

You should be able to substitute the macosx spec with the unix version.

FrankD412 avatar Nov 10 '20 20:11 FrankD412

Hello Frank,

Thanks for giving me a challenging example! And, this does work the way I intended.

I could implement tree links if there was interest, but I think that this meets our current needs. Personally, I prefer the flat directory naming.

-Chris

sample_output/lulesh/links/2020-11-11/run-0001 ├── ITER.10.SIZE.10 │   └── run-lulesh -> maestro/run-lulesh/ITER.10.SIZE.10 ├── ITER.10.SIZE.20 │   └── run-lulesh -> maestro/run-lulesh/ITER.10.SIZE.20 ├── ITER.10.SIZE.30 │   └── run-lulesh -> maestro/run-lulesh/ITER.10.SIZE.30 ├── ITER.20.SIZE.10 │   └── run-lulesh -> maestro/run-lulesh/ITER.20.SIZE.10 ├── ITER.20.SIZE.20 │   └── run-lulesh -> maestro/run-lulesh/ITER.20.SIZE.20 ├── ITER.20.SIZE.30 │   └── run-lulesh -> maestro/run-lulesh/ITER.20.SIZE.30 ├── ITER.30.SIZE.10 │   └── run-lulesh -> maestro/run-lulesh/ITER.30.SIZE.10 ├── ITER.30.SIZE.20 │   └── run-lulesh -> maestro/run-lulesh/ITER.30.SIZE.20 ├── ITER.30.SIZE.30 │   └── run-lulesh -> maestro/run-lulesh/ITER.30.SIZE.30 ├── SIZE.10 │   └── post-process-lulesh-size -> maestro/post-process-lulesh-size/SIZE.10 ├── SIZE.20 │   └── post-process-lulesh-size -> maestro/post-process-lulesh-size/SIZE.20 ├── SIZE.30 │   └── post-process-lulesh-size -> maestro/post-process-lulesh-size/SIZE.30 ├── TRIAL.1 │   └── post-process-lulesh-trials -> maestro/post-process-lulesh-trials/TRIAL.1 ├── TRIAL.2 │   └── post-process-lulesh-trials -> maestro/post-process-lulesh-trials/TRIAL.2 ├── TRIAL.3 │   └── post-process-lulesh-trials -> maestro/post-process-lulesh-trials/TRIAL.3 ├── TRIAL.4 │   └── post-process-lulesh-trials -> maestro/post-process-lulesh-trials/TRIAL.4 ├── TRIAL.5 │   └── post-process-lulesh-trials -> maestro/post-process-lulesh-trials/TRIAL.5 ├── TRIAL.6 │   └── post-process-lulesh-trials -> maestro/post-process-lulesh-trials/TRIAL.6 ├── TRIAL.7 │   └── post-process-lulesh-trials -> maestro/post-process-lulesh-trials/TRIAL.7 ├── TRIAL.8 │   └── post-process-lulesh-trials -> maestro/post-process-lulesh-trials/TRIAL.8 ├── TRIAL.9 │   └── post-process-lulesh-trials -> maestro/post-process-lulesh-trials/TRIAL.9 └── all_records ├── make-lulesh -> maestro/make-lulesh └── post-process-lulesh -> maestro/post-process-lulesh

45 directories, 0 files

crkrenn avatar Nov 12 '20 03:11 crkrenn

Not a problem! Glad you were able to verify that the feature is working as intended. I think this PR is great for a first pass and we can revisit to add trees at a later time.

FrankD412 avatar Nov 12 '20 03:11 FrankD412

@crkrenn -- Added jinja2 to setup.py -- we currently maintain that so that Maestro can be installed in editable mode, something that the toml file doesn't currently support.

FrankD412 avatar Nov 12 '20 16:11 FrankD412

Hello again Frank,

Reviewing, it looks like there were at least one question and suggestion:

A question: should I keep the new code in utils.py, or should I move it to a new module? Would it count as an "interface" (e.g. interfaces/linker.py)?

It needs more documentation and tests, but I'd like your initial impressions. And, I also need to add some additional logging. There are also some minor differences (e.g. 'apply dag' vs. 'apply environment') that I will resolve before removing the "WIP" label.

-Chris

crkrenn avatar Dec 01 '20 22:12 crkrenn

Hello again Frank,

Reviewing, it looks like there were at least one question and suggestion:

A question: should I keep the new code in utils.py, or should I move it to a new module? Would it count as an "interface" (e.g. interfaces/linker.py)?

It needs more documentation and tests, but I'd like your initial impressions. And, I also need to add some additional logging. There are also some minor differences (e.g. 'apply dag' vs. 'apply environment') that I will resolve before removing the "WIP" label.

-Chris

@crkrenn -- I removed the WIP title, but feel free to add it back. I went based off of our prior discussion.

The documentation would be appreciated. For now, let's leave it in the utilities; I don't see folks deriving from it and it's very niche to Maestro. We can move it later if we find use for it. Does that sound alright to you?

FrankD412 avatar Dec 01 '20 23:12 FrankD412

@FrankD412, (cc: @jsemler),

I think that this is finally ready to merge.

Please let me know if you have any questions.

Thanks!

-Chris

crkrenn avatar Mar 25 '21 17:03 crkrenn

@crkrenn -- Thanks for the heads up, sorry it's taken me so long to get back. Do you mind rebasing this branch so that the CI can run? I recently fixed it so we can actually run it.

FrankD412 avatar Mar 30 '21 17:03 FrankD412

Hello @FrankD412,

I fixed a flake8 error. I think it should be finally ready to merge now.

-Chris

crkrenn avatar Apr 09 '21 00:04 crkrenn

@FrankD412, (cc: @daub1)

Thanks again for the careful review.

Regarding:

  1. I ran with the defaults maestro run --make-links ./samples/lulesh/lulesh_sample1_unix.yaml. This appears to work just fine, but it may run into issues if someone is linking to the same directory and running multiprocessing or something unexpected. As a user feature it behaves as expected.

I was relying on maestro to create a unique run directory. Is maestro robust for multiprocessing?

I noticed that run_study seems to rely on time.strftime("%Y%m%d-%H%M%S") to make a unique directory. Will that be good enough when we start pounding on maestro during a forensics exercise?

Very respectfully,

-Chris

PS. The generate_filename method should work for multiprocessing cases, but I don't see that you use that anywhere currently.

crkrenn avatar Apr 29 '21 01:04 crkrenn

Hello Frank,

I think that I've addressed point 2. Thanks for finding the bug!

Please let me know if you have any questions.

-Chris

crkrenn avatar Apr 29 '21 04:04 crkrenn

Hello Frank, Can you make some time to review this next week? Thanks! -Chris

crkrenn avatar May 06 '21 20:05 crkrenn

@crkrenn -- I can do my best to look over this tomorrow; if not then, probably over the weekend.

FrankD412 avatar May 07 '21 04:05 FrankD412

@FrankD412, (cc: @daub1)

Thanks again for the careful review.

Regarding:

  1. I ran with the defaults maestro run --make-links ./samples/lulesh/lulesh_sample1_unix.yaml. This appears to work just fine, but it may run into issues if someone is linking to the same directory and running multiprocessing or something unexpected. As a user feature it behaves as expected.

I was relying on maestro to create a unique run directory. Is maestro robust for multiprocessing?

It is, I was thinking more from the library case, or if a user was generating studies via a script or something that could possibly generate things fast enough to get the same timestamp (which is unlikely, but still possible).

I noticed that run_study seems to rely on time.strftime("%Y%m%d-%H%M%S") to make a unique directory. Will that be good enough when we start pounding on maestro during a forensics exercise?

I expect it'll be fine -- I just wanted to be thorough in the thought process.

PS. The generate_filename method should work for multiprocessing cases, but I don't see that you use that anywhere currently.

You know... it seems that this was intended for use and never got used. Good catch.

I'm looking over this now, I ended up not getting to it over the weekend, so hopefully I haven't set you behind.

FrankD412 avatar May 10 '21 15:05 FrankD412

@crkrenn -- I test this this morning and it appears to work. There we a number of newlines added to the help messages, so I pruned a good number of them but made some tweaks. Otherwise I think this PR is good to go.

FrankD412 avatar May 10 '21 18:05 FrankD412

After some digging, I think you might be interested in some of what @jwhite242 is adding to his PR. He add the parameters to the _StepRecord class, which you're gathering by chopping the step name off of the full name. It might be worth waiting for that PR to get merged and then revisit this.

FrankD412 avatar May 11 '21 05:05 FrankD412

@FrankD412,

I have no objection to waiting for @jwhite242's changes. I'm looking forward to the new status code!

-Chris

crkrenn avatar May 21 '21 20:05 crkrenn

@crkrenn -- Great! I appreciate your flexibility. The new code includes the nodes in the ExecutionGraph having the parameters used to generate them, etc. That should make it much easier to generate the link tree without needing to rely on scanning the filesystem structure.

FrankD412 avatar May 21 '21 20:05 FrankD412

Sounds good. And, I don't think that the link code currently scans the filesystem tree. It makes parent directories if needed, and it uses binary search if the user has chosen integer indexing.

crkrenn avatar May 21 '21 22:05 crkrenn

@crkrenn -- Revisiting this; @jwhite242's changes were merged in so it should be okay to revisit this.

FrankD412 avatar Nov 02 '21 17:11 FrankD412

@crkrenn -- just wanted to follow up on this, considering your other PR.

FrankD412 avatar Feb 22 '22 04:02 FrankD412

@FrankD412, I merged the current dev branch into this pull request.

It still passes all tests, but it was not clear to me if there is redundant code that I should now merge.

Can you take a look when you can?

Makelinks was also a frequent request during our exercise.

Chris

@antimatterhorn @jsemler

crkrenn avatar Aug 13 '22 01:08 crkrenn

Ok, I'll take a look at _StepRecord and ExecutionGraph when I can.

crkrenn avatar Aug 13 '22 01:08 crkrenn

Can you expand on the use case for this a little more? I'd like to better understand where Maestro is creating pain points for users, and it sounds like there were some that came up last week?

Just reading through the PR it seems like this is aimed mostly at creating some names that are more readable/faster for tab completion for things, particularly for the studies when multiple are created in the same output_path? On this use case I've a few questions/comments:

  • A minor request on the default names of things: what about using 'study' instead of 'run'? 'study' seems more generic/appropriate for being baked into maestro whereas 'run' has perhaps a bit too domain specific meaning.

  • I do have some concerns on the run-time/process safety of this. These links are being created outside individual study work-spaces, and those work-spaces can be created by multiple maestro/conductor processes. The time stamp format that's being renamed here is more robust (not perfect of course) to this usage in that it can be generated fully independently and is relatively unique. The ~uniqueness of that time stamp and the independent generation significantly reduces the chances of collisions from multiple conductors trying to grab the same path -> i.e. seems less likely to collide since it doesn't require indexing of the output_path first which could be invalidated by another conductor doing the same thing at the same time. It does seem like your idea of another util to do that outside of maestro might be safer in that it has no chance of clobbering/interrupting the actual running of the study which this one could do given the execution graph is the one that might choke on those collisions. We do have an iterator over the meta data that could facilitate this, and I'd be happy to merge that in this week if you want to play with that.

jwhite242 avatar Aug 15 '22 18:08 jwhite242

Hi Jeremy, (cc: @FrankD412)

Thanks very much for reviewing this request. I'll try to respond point by point...

Can you expand on the use case for this a little more? I'd like to better understand where Maestro is creating pain points for users, and it sounds like there were some that came up last week?

  • Our users want a way to quickly identify (and remember) a small number of runs out of hundreds or thousands. Remembering an index is much easier than remembering a time or a combination of randomly selected variables.
  • These short labels would also be very valuable when plotting many hundreds of runs.

Just reading through the PR it seems like this is aimed mostly at creating some names that are more readable/faster for tab completion for things, particularly for the studies when multiple are created in the same output_path? On this use case I've a few questions/comments:

A minor request on the default names of things: what about using 'study' instead of 'run'? 'study' seems more generic/appropriate for being baked into maestro whereas 'run' has perhaps a bit too domain specific meaning.

  • No objections here. In the example, study is more appropriate, where it essentially is a shorthand abbreviation for the submission time.
    • [Default: {{link_directory}}/{{date}}/study-{{INDEX}}/{{instance}}/{{step}}]
  • After last week, I can also see value in indexing "instances" ("X1.5.X2.5.X3.20"). In maestro class naming conventions, this would be a "combination" .
    • [Default: {{link_directory}}/{{date}}/study-{{INDEX}}/combo-{{INDEX}}/{{step}}], or
    • [Default: {{link_directory}}/{{date}}/study-{{INDEX}}/combo-{{INDEX}}-{instance}/{{step}}]

I do have some concerns on the run-time/process safety of this. These links are being created outside individual study work-spaces, and those work-spaces can be created by multiple maestro/conductor processes. The time stamp format that's being renamed here is more robust (not perfect of course) to this usage in that it can be generated fully independently and is relatively unique. The ~uniqueness of that time stamp and the independent generation significantly reduces the chances of collisions from multiple conductors trying to grab the same path -> i.e. seems less likely to collide since it doesn't require indexing of the output_path first which could be invalidated by another conductor doing the same thing at the same time. It does seem like your idea of another util to do that outside of maestro might be safer in that it has no chance of clobbering/interrupting the actual running of the study which this one could do given the execution graph is the one that might choke on those collisions. We do have an iterator over the meta data that could facilitate this, and I'd be happy to merge that in this week if you want to play with that.

  • I can see pros and cons to having "linking" being embedded in maestro vs. a post processing tool.

    • Embedded:
      • Pro: within maestro, the linker will have access to all necessary data
      • Con: as you say, it could affect the stability of maestro
    • Not embedded:
      • Pro: keeps maestro simple
      • Con: the linker should be run after each maestro step, import information on the step, and avoid collisions with other maestro steps. This feature is important for real time emergency response applications like forensics. We need to be able to analyze studies while they are stil in process.
  • I think that the pros of including linking in maestro outweigh the cons, particularly because you minimize collisions (running once per study vs. once per step).

    • I've tried to write the linking code to be robust, and I understand that it needs better review and testing. The reason that I decided on links was that we can accept infrequent errors in linking, and these can be fixed after the simulation is run. But, errors in linking will not affect the directories where simulations are being run or post-processing is being done.
  • I still have more work to do on this pull request:

    • Required: implement link collision treatments
    • Required/desired: improved testing
    • Desired: support hashed directories
    • Desired: support "combo" indexing

Please let me know what you think of these answers and arguments and if you have more questions.

-Chris

crkrenn avatar Aug 16 '22 17:08 crkrenn

@FrankD412, @FrankD412, @jsemler, @daub1,

Can you share your thoughts and plans about whether a conductor running a single maestro study will ever be parallelized?

The simplest solution for storing indices for links is to store them in a Linker object, but this would require interprocess communication if a conductor was run in parallel.

Thanks!

-Chris

crkrenn avatar Aug 26 '22 13:08 crkrenn

All: I'm only working one day per month now, so it is unlikely that I will be able to push this over the finish line by myself. I did refactor so that the linking takes advantage of _StepRecord and ExecutionGraph. The only current conflict is poetry.lock, which should be easy to resolve. V/r, -Chris

crkrenn avatar Feb 22 '23 01:02 crkrenn