dspy
dspy copied to clipboard
Optimizer issue: Using a pre-compiled module as a teacher causes an assertion crash in bootstrap.py
Assertion: Student and teacher must have the same number of predictors.
Code sample to repro:
optimizer = BootstrapFewShotWithRandomSearch(
metric=MyModule.metric,
teacher_settings=dict({"lm": teacher_lm}),
)
teacher_module, _ = MyModule.load_compiled_model()
for _ in range(2):
student_module = MyModule().activate_assertions()
teacher_module = optimizer.compile(
student = student_module,
teacher = teacher_module,
trainset=train_set
)
return teacher_module
Hi @drawal1 , could you print the number of predictors for the teacher_module and student_module at each step through ..._module.predictors()?
I'm wondering if this is triggered after activate_assertions() is called for the student_module, although it should only be mapping to the module predictors and not changing how many there are or potentially due to the loading of the pre-compiled module (I'm assuming MyModule.load_compiled_model() translates to just .load() since there is no .load_compiled_model() in DSPy
I was also unable to replicate the error with a sample run, testing with RAG() from the intro notebook as MyModule(). Feel free to share more details about the run if possible so we can take a closer look!
"I'm assuming MyModule.load_compiled_model() translates to just .load()" - correct
Does this help?
@arnavsinghvi11 - I think I have a clue! Is teacher.predictors() aggregating across all candidate programs? If I generate13 candidate programs from the the 1st iteration and each program has 3 predictors, the teacher would end up with 39 predictors as you can see from the screenshot in the prior post
Looking at the code... Below is the comment for named_sub_modules() where it picks up the number of predictors:
def named_sub_modules(self, type_=None, skip_compiled=False) -> Generator[tuple[str, "BaseModule"], None, None]:
"""Find all sub-modules in the module, as well as their names.
Say self.children[4]['key'].sub_module is a sub-module. Then the name will be
'children[4][key].sub_module'. But if the sub-module is accessible at different
paths, only one of the paths will be returned.
"""
Should it be picking up the sub-modules in this case?
I think that's on the right track but the teacher module already has 39 predictors before the 3rd step in BootstrapFewShotWithRandomSearch, which is past zeroshot, LabeledFewShot and is doing the 1st BootstrapFewShot step.
Just to confirm, was this loaded module compiled on 10+3 candidate programs? Could you check the number of predictors of the teacher module right after loading the compiled model, before the for loop.? If that is indeed 39, this lines up with the teacher module including the sub-modules for all 13 named_parameters.
Passing skip_compiled
may fix this based on some existing [tests logic] (https://github.com/stanfordnlp/dspy/blob/46df3a06ed3450285bdc55e1bc5a4bf061c37b4e/tests/functional/test_functional.py#L236) but it will have to change internal code for saving/loading modules.
This is a bit deep but if this is indeed the issue, feel free to open a PR for making the change!
Setting skip_compiled=True works a bit too well :) It gets rid of ALL the predictors. But at least we are on the right track. Notice how it found 5 candidate programs in the first iteration. And the teacher has 5x3=15 predictors.
Question now is figuring out code intent. Which set of 3 teacher predictors do we pick in this case? How are they used? Should the code loop over all 5 of these candidate programs during the bootstrap process if teacher is pre-compiled?
@arnavsinghvi11 - I think I found the issue and the solution. There is no need to mess with skip_compiled
A compiled program has candidate programs, so to use it as a teacher, we should just keep the best candidate program (teacher.candidate_programs[0]) and remove the rest.
Edge case: if a deserialized EnsembledPrograms is used as a teacher, we keep teacher.programs[0] and remove the rest
I will submit a PR accordingly
Fix is in PR #843. The fix is in teleprompt/random_search.py
You can now write code like this:
for i in range(3):
if i == 0:
teacher_module, _ = MyModule.load_compiled_model()
optimizer = BootstrapFewShotWithRandomSearch(
metric=my_metric,
teacher_settings=dict({"lm": teacher_lm}),
)
student_module = MyModule().activate_assertions()
teacher_module = optimizer.compile(
student = student_module,
teacher = teacher_module,
trainset=train_set,
)
return teacher_module