sacred
sacred copied to clipboard
Strange run._id Iteration Behavior With Multiple Observers
I've noticed that run._id
s exhibit what is (at least to me) somewhat strange behavior when you have multiple observers which have different numbers of prior runs in them for a given experiment: namely, they take whichever observer was appended first, generate run._id
as being that observer's max prior ID + 1, and then use that for both of the added observers. If you have an observer with multiple runs within it, and decide you'd like to add another observer, let's say, a database observer, and you append the new observer before the old one, your code will break, because the new observer will generate an id of 1, which will throw a FileExists error within the FileStorageObserver.
At first glance, it seems like a possible fix here would be (at this line here: https://github.com/IDSIA/sacred/blob/master/sacred/run.py#L318), instead of adopting the run._id from the first observer in the list, instead to take the _ids generated by all observers, and then set the run._id as the max over all all returned observer _ids. However, this has the possibly undesirable property of breaking the coupling between run._id and each observer's notion of a run_id. This could maybe be fixed by having a separate observer method for "return max ID", since that would allow you to run that for each observer, get the max ID, and then call _started_event for each observer with id equal to that max ID. This has the nice property that IDs will be consistent across the set of all observers, since if observer 1 is already 15 runs in, and you add observer 2, it'll always automatically start at 16, regardless of the order of observer appending.
If the maintainers are up for having a PR submitted that implements something like this, I'd be up for taking on writing one. That said, I recognize this would be a decently involved bit of code surgery to fix what might be a niche issue, and that y'all might not prefer to go ahead with it. In that case, I'd advocate some warning box on the Observer page calling out the fact that you can see strange behavior when there are multiple run-offset observers.
(I've replicated the above-mentioned behavior in the code below with two FileStorageObservers, but from what I can tell in digging into the mechanics of the code, I'd expect this kind of issue for any observer that generates auto-incrementing IDs. To reproduce, first run the code as-is, and then run it a second time with the commented lines un-commented, so as to produce observers that are offset from one another in how many runs they have)
from sacred import Experiment
first_obs = FileStorageObserver.create("original_observer")
#second_obs = FileStorageObserver.create("added_observer")
test_ing = Experiment("cat_noise")
#test_ing.observers.append(second_obs)
test_ing.observers.append(first_obs)
@test_ing.config
def config_ing():
noise = "meow"
cat = "aegon"
@test_ing.automain
def make_a_noise(cat, noise):
print(f"{cat} says {noise}")```
Thanks for reporting this issue! I believe that this would be worth a fix, but I also see that it might require extending the Observer interface. Right now finding the next id and creating the experiment is mangled into started_event
. We would have to pull this apart and then take the max as you proposed. However, that would probably make the code a bit cleaner anyway. Something along the lines of
next_id = max(observer.next_id() for observer in self.observers)
for observer in self.observers:
observer.started_event(next_id, ....)
We would also have to consider special cases, like overriding experiment with the MongoObserver
. Please go ahead if you like to tackle this issue.
Observers are sorted by priority and the observer with the highest priority gets to choose the _id
. The problem in your case is that you are adding a MongoObserver
later on, which by default has priority over the FilestorageObserver
.
If you are fine with the _id
chosen by the FilestorageObserver
, you could just reduce the priority of the MongoObserver
to less than 20 (using the !15
suffix on the commandline or by passing a priority=15
argument to the constructor).
Otherwise, you could implement a custom observer with high priority that choses an appropriate _id
in its started_event
and does nothing else.
I was actually unaware of these priority semantics. Sorry for the misinformation.
@JarnoRFB Documentation on this could be a bit better I guess. Questions about the behavior of _id
are rather common. Maybe we should add a section detailing the _id
system and how to customize it?
@Qwlouse sounds like a good idea.