dspy Optimizer issue: Using a pre-compiled module as a teacher causes an assertion crash in bootstrap.py

Assertion: Student and teacher must have the same number of predictors.

Code sample to repro:

    optimizer = BootstrapFewShotWithRandomSearch(
        metric=MyModule.metric,
        teacher_settings=dict({"lm": teacher_lm}),
    )

    teacher_module, _ = MyModule.load_compiled_model()

    for _ in range(2):
        student_module = MyModule().activate_assertions()

        teacher_module = optimizer.compile(
            student = student_module,
            teacher = teacher_module,
            trainset=train_set
        )

   return teacher_module

Apr 15 '24 15:04 drawal1

Hi @drawal1 , could you print the number of predictors for the teacher_module and student_module at each step through ..._module.predictors()?

I'm wondering if this is triggered after activate_assertions() is called for the student_module, although it should only be mapping to the module predictors and not changing how many there are or potentially due to the loading of the pre-compiled module (I'm assuming MyModule.load_compiled_model() translates to just .load() since there is no .load_compiled_model() in DSPy

I was also unable to replicate the error with a sample run, testing with RAG() from the intro notebook as MyModule(). Feel free to share more details about the run if possible so we can take a closer look!

Apr 16 '24 02:04 arnavsinghvi11

"I'm assuming MyModule.load_compiled_model() translates to just .load()" - correct

Does this help?

Apr 16 '24 02:04 drawal1

@arnavsinghvi11 - I think I have a clue! Is teacher.predictors() aggregating across all candidate programs? If I generate13 candidate programs from the the 1st iteration and each program has 3 predictors, the teacher would end up with 39 predictors as you can see from the screenshot in the prior post

Apr 16 '24 02:04 drawal1

Looking at the code... Below is the comment for named_sub_modules() where it picks up the number of predictors:

    def named_sub_modules(self, type_=None, skip_compiled=False) -> Generator[tuple[str, "BaseModule"], None, None]:
        """Find all sub-modules in the module, as well as their names.

        Say self.children[4]['key'].sub_module is a sub-module. Then the name will be
        'children[4][key].sub_module'. But if the sub-module is accessible at different
        paths, only one of the paths will be returned.
        """

Should it be picking up the sub-modules in this case?

Apr 16 '24 02:04 drawal1

I think that's on the right track but the teacher module already has 39 predictors before the 3rd step in BootstrapFewShotWithRandomSearch, which is past zeroshot, LabeledFewShot and is doing the 1st BootstrapFewShot step.

Just to confirm, was this loaded module compiled on 10+3 candidate programs? Could you check the number of predictors of the teacher module right after loading the compiled model, before the for loop.? If that is indeed 39, this lines up with the teacher module including the sub-modules for all 13 named_parameters.

Passing skip_compiled may fix this based on some existing [tests logic] (https://github.com/stanfordnlp/dspy/blob/46df3a06ed3450285bdc55e1bc5a4bf061c37b4e/tests/functional/test_functional.py#L236) but it will have to change internal code for saving/loading modules.

This is a bit deep but if this is indeed the issue, feel free to open a PR for making the change!

Apr 16 '24 03:04 arnavsinghvi11

Setting skip_compiled=True works a bit too well :) It gets rid of ALL the predictors. But at least we are on the right track. Notice how it found 5 candidate programs in the first iteration. And the teacher has 5x3=15 predictors.

Question now is figuring out code intent. Which set of 3 teacher predictors do we pick in this case? How are they used? Should the code loop over all 5 of these candidate programs during the bootstrap process if teacher is pre-compiled?

Apr 17 '24 03:04 drawal1

@arnavsinghvi11 - I think I found the issue and the solution. There is no need to mess with skip_compiled

A compiled program has candidate programs, so to use it as a teacher, we should just keep the best candidate program (teacher.candidate_programs[0]) and remove the rest.

Edge case: if a deserialized EnsembledPrograms is used as a teacher, we keep teacher.programs[0] and remove the rest

I will submit a PR accordingly

Apr 17 '24 20:04 drawal1

Fix is in PR #843. The fix is in teleprompt/random_search.py

Apr 17 '24 20:04 drawal1

You can now write code like this:

        for i in range(3):
            if i == 0:
                teacher_module, _ = MyModule.load_compiled_model()

            optimizer = BootstrapFewShotWithRandomSearch(
                metric=my_metric,
                teacher_settings=dict({"lm": teacher_lm}),
            )

            student_module = MyModule().activate_assertions()

            teacher_module = optimizer.compile(
                student = student_module,
                teacher = teacher_module,
                trainset=train_set,
            )
          
            return teacher_module

Apr 17 '24 20:04 drawal1

dspy dspy copied to clipboard

Optimizer issue: Using a pre-compiled module as a teacher causes an assertion crash in bootstrap.py

dspy
dspy copied to clipboard