Making readme (and other docs parts) more approachable
Collection of feedback we got on readme (and other parts of documentation), focused on making it more approachable):
Referring to API overview diagram from Kathrin:
- [ ] I found the description of parameters as "individual degrees of freedom" a bit too technical. I am not sure if this term is familiar for all users and would suggest to name it "variables that can be adjusted during an experiment" or something similar.
- [ ] I would use sharp edges for the boxes in the legend as the corresponding boxes in the diagram have sharp edges, too, and round edges correspond to the three stages "1. Configuration, 2. BO loop, 3. Post-analysis".
- [ ] Very minor point: I found it a bit inconsistent that most boxes are descriptive and some use active term like "Objective - How to combine multiple targets". Maybe it's overkill, but one could change:
- Objective: Combination of multiple targets
- Lab Outcomes / Oracle: Measurements for recommendations
- Insights: Analysis of experimental results
From my colleague with bio and some coding background:
- [x] In general the terminology is often too technical and thus not understandable or requires additional thinking to decipher it. Would suggest sometimes writing more verbose explanations with less jargon.
Examples:
- In the readme the list of features is much harder to follow than the code example below. There is a lot of technical jargon as well as BayBE (or dependency packages) specific terms (e.g. "space", "hypothesis-tested")
- In the readme the sentence "focusing on additions that enable real-world experimental campaigns." is less clear than "BayBE offers a range of built-in features crucial for real-world use cases."
- [x] Missing non-expert explanation of why one would even use BayBE (i.e., if unfamiliar with the term of BO) [Note from Karin: may be possible to provide brief explanation/visualisation somewhere? (at start of readme or linked in readme); Ax is also a good example of more approachable documentation: https://ax.dev/docs/why-ax]
- [x] In the readme the list of features requires motivation why sth is important (as given for e.g., "Custom parameter encodings")
- [ ] In the code examples some comments in the code could be helpful for following along at some places
- [ ] Unclear what is meant by "administers the current state of your experimental operation" besides the API overview diagram
From Jonathan (when he started) on the readme:
- [x] Visual Feedback? perhaps a plot to see how the optimization is progressing.
- On top of print df maybe print scatter plot showing iterations of Pressure vs. Yield [gave example of the plot that is at the bottom currently in Advanced Example]
- [ ] Real-World Connection? The example is about maximizing "Yield," and having 3 control knobs “Granularity, pressure, solvent” but it lacks context imo, maybe a high level explanation of what the experiment is for? This will allow "Yield" to be represented in a relatable way.
- [x] The Ending was a little too abrupt. It runs campaign.recommend(batch_size=3), but then what happens next? I was a little lost. Overall I would appreciate some explanation of results → It doesn’t explain what to expect in the output (e.g., what the recommended parameters mean, why they were chosen, how the algorithm works at a high level).
- What does campaign.recommend(batch_size=3) return?
- How do users interpret the dataframe?
- What should a beginner do next?
- [ ] Additional thought, In real-world optimization, you usually update the campaign with new results and refine recommendations, it’ll be great if this is showcased somehow
- Show update cycle? How babye refines rec over time, good for beginners
campaign.add_measurements(df)
df_new = campaign.recommend(batch_size=3)
print(df_new)
- [ ] Would be nice to have something like a google colab notebook one can run
Some additional points Jonathan needed to look up/get info on before starting (based on his notes). So these may be things that we could think of explaining.
- [x] What is actually the model process - [Note from Karin: may be a good thing to add as a diagram at the top of the code example?]
He summarised it as:
- Start with Initial Data ( Run a few initial experiments.)
- Fit the Surrogate Model (Train a Gaussian Process (or another model) on observed data.)
- Predict New Points (The model estimates how good unexplored parameters might be.)
- Select the Next Experiment (Use an Acquisition Function (like Expected Improvement) to pick the next best experiment.)
- Update the Model ( After running the experiment, add new data and repeat. Back to step 2)
- What is a surrogate model and how it relates to different recommenders
Other ideas (Karin):
- [x] Decision help (e.g. diagram) to guide user through campaign design (e.g. do you have continous/categ/embed params, do they have constraints, etc)
I suggest we triage these points and create sub-issues for the ones that are deemed actionable. Thesa re already too many points to effectively do this in this Issue here so perhaps in a meeting.
Personally, I don't see many of them required. E.g. changing the language in the Readme so that it contains no technical language for example completely defies its purpose to be concise. Imagine if mathematicians talked in language accessible to everyone...the science would grind to a halt. The Readme is no lecture. It is totally OK if someone does not understand all terms immediately, this is imo an impractical goal that comes with more downsides and thus should not be persued. This holds for many of the points above.
Most points are also on the Readme, but Readme != documentation.
I definitely agree that 1.)readme should be sufficiently concise and not turned into a lecture and 2.) that there must be some technical terms. However, I think many points could be made significantly clearer by adding 2-3 words, small rewordings, and a few links of high-quality reliable information sources for some of the terms.
I think the topic is too important to be ignored. - Readme is the first point of contact and it should not demotivate newcomers. (+ I must admit that even I was at points unsure when reading through it with my friend yesterday.)
If everyone agrees and would be prepared to accept some changes I would happily take on the task of making the readme more approachable. This would involve some clarifications (as noted above - mainly rewording) and likely 1-2 new diagrams.
First of all - thanks for collecting all of that feedback. I did not go through it in detail, but I already appreciate you taking the time for collecting. Also, note that we have a (probably extremely outdated) issue for collecting minor issues in the documentation, see #195 - depending on what is in there, might make sense to have a look and see if some of those can also easily be fixed (but don't feel forced to!).
Also, feels to me like we should maybe finally have the big documentation clean-up once target transformation monster branch is being merged, what do you think of this?
Summary from discussion today:
- [x] Replace first sentence with something more intuitive (just a sentence, optionally plot - may make it too full and we do not want it too generic or too specialised so it would be hard to find a good balance for the plot). The first sentence could be more in the style of "Find a good parameter setting within a complex parameter space".
- [x] Potentially add examples on how to use BayBE for different use-cases as in BayChem
- [x] Add diagram about design being core part
- [x] Add FAQ section with design checklist (see this example)
Wrt the above comments:
- [x] I can try to simplify the readme language once we have a version with above additions
- [x] Consider adding a bit nicer ending about the recommendation loop as suggested by Jonathan above