Improve the observability and readability of BootstrapFewShot optimizer
The PR has two parts:
- Better progbar, currently the progbar is quite confusing because it never finishes. Instead of setting the length of progbar as the training set size, it should reflect the actual step. See below for logging change after this PR:
Before the PR:
40%|█████████████████████████████████████████████▏ | 4/10 [00:00<00:00, 800.94it/s]
0%| | 0/10 [00:00<?, ?it/s]
0%| | 0/10 [00:00<?, ?it/s]
0%| | 0/10 [00:00<?, ?it/s]
0%| | 0/10 [00:00<?, ?it/s]
Bootstrapped 4 full traces after 1 examples in round 4.
After the PR:
Bootstrapping 4 examples: 100%|████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 721.88it/s]
2024-10-02T00:33:26.452003Z [info ] Bootstrapped 4 full traces after 5 examples in round 1. [dspy.teleprompt.bootstrap] filename=bootstrap.py lineno=155
- Improved readability, including renaming vague variables and methods, adding comments on code that cannot self explain, and deleting unused code.
Thanks @chenmoneygithub ! Overall, I like the goals of this PR but I don't think we should merge yet.
I think some functionality may have changed accidentally(?), see above. We will re-design this optimizer so it's parallel etc so maybe we should jump straight to that.
On logging, I'm not sure I like the proposed approach. It makes it really hard to see if the bootstrapping keeps failing or how much progress it's making. Keep in mind that in the general case, bootstrapping can fail 100s of times and may even never succeed. Right now, the proposal here would make the user see zero progress.
Maybe the better thing to do is to just bump the progress bar to full after bootstrapping exits :D
But as I said, this all will change with parallelism. We should just jump right into that.
@okhat Thanks for reviewing! Yea this progbar will not be valid if we enable parallelism.
It makes it really hard to see if the bootstrapping keeps failing or how much progress it's making.
Good point! That's actually an area I am uncertain about - I suspect if our users can understand what "bootstrap failure" means, I couldn't understand what bootstrap means until reading through the source code, which is unlikely to happen for our users. For us seeing bootstrapping failures helps with debugging, but users may find the logging as pure noise and no idea what that means. However, as you mentioned, if bootstrapping keeps failing users will see 0 progress, which is bad as well.
Agree we should not merge this PR! I will open a separate one which only contains trivial fixes as in this PR for better readability.