Test if content creation with AI could stimulate the creation of reviewed content
Referencing #1882
I took empty chapter https://6.docs.plone.org/backend/subscribers.html as an example.
A first prompt: Write a documentation chapter about subscribers (event handler) in Plone for Version Plone 6. Take the two chapters of Plone 5 documentation https://5.docs.plone.org/external/plone.app.dexterity/docs/advanced/event-handlers.html and https://5.docs.plone.org/develop/addons/components/events.html into account.
tool: ChatGPT https://chatgpt.com
The response to the above mentioned prompt listed the following sources:
- https://5.docs.plone.org/develop/addons/components/events.html
- https://6.docs.plone.org/backend/subscribers.html
- https://5.docs.plone.org/external/plone.app.dexterity/docs/behaviors/intro.html
- https://6.docs.plone.org/backend/sending-email.html
- https://5.docs.plone.org/external/plone.app.dexterity/docs/advanced/index.html
- https://6.docs.plone.org/backend/index.html
- https://5.docs.plone.org/external/plone.app.dexterity/docs/index.html
- https://6.docs.plone.org/backend/behaviors.html
- https://5.docs.plone.org/external/plone.app.dexterity/docs/advanced/custom-content-classes.html
- https://6.docs.plone.org/backend/relations.html
- https://5.docs.plone.org/external/plone.app.dexterity/docs/behaviors/behavior-basics.html
- https://6.docs.plone.org/backend/traversal-acquisition.html
- https://5.docs.plone.org/external/plone.app.dexterity/docs/testing/mock-testing.html
- https://6.docs.plone.org/volto/index.html
- https://5.docs.plone.org/external/plone.app.dexterity/docs/testing/unit-tests.html
- https://6.docs.plone.org/plone.api/index.html
- https://5.docs.plone.org/develop/styleguide/javascript.html
- https://6.docs.plone.org/volto/configuration/settings-reference.html
- https://5.docs.plone.org/external/plone.app.dexterity/docs/behaviors/providing-marker-interfaces.html
- https://6.docs.plone.org/deployment/caching/etags.html
Preview of this chapter: https://plone6--1887.org.readthedocs.build/backend/subscribers.html
📚 Documentation preview 📚: https://plone6--1887.org.readthedocs.build/
[EDIT] I added the used tool and the sources information of the response.
@MrTango Is there a deeper reason the topic was renamed from Event Handlers to Subscribers by the documentation team? I could not find something related (a #LADR can help)
The current Subscribers (event handlers) chapter was inserted as todo by @MrTango , no further notes in the commit found…
I would suggest to put this example explicitly in draft mode.
In this particular case at first sight content was changed seriously and shifts focus. This can become too misleading to make this an easy peasy task. (my opinion) – LLMs are chatterboxes (like me).
A longer response is needed to dive into the technical backgrounds . It could explain some general issues with the approach "as is". On the other hand we can achieve a lot with some adjustments to the workflow, that adds complexity but is worth the effort.
First of all: AI can do a lot of things very well. We are using this with respect to actual needs
To create valueful results, you need to create proper system prompts or better meta-prompts in advance to setup the constraints. Without the results are too vague and always need review.
The amount of review is directly dependent on the quality of the prompts and workflow pipeline you use.
Beyond the creation of the basic system prompts, the setup of the boundaries the sources to be used, you also can create workflows and QS chains that can iterate over the content many times and do automated QA. It is also a clever idea to use another system/model in the pipeline to review the output of another.
In my opinion your attempt is a nice try but as you may see in the result there is a longer way to go than a oneshot prompt – in particular for longer content snippets, that are more intertwingled with other context (that is a real challenge for LLMs).
Just playing around with too simple LLM requests (aka prompts) ends up with more afterwork and spoils the benefits at the end.
Misleading interpretation of focus
The main thing is that the new content is putting focus on parts and omits others looking like an opinionated version, that was not requested by the prompt.
One simple example:
- The simple and effective statement: "Zope’s event model is synchronous. " is completely omitted in the new output, it would help a lot having and keeping this upfront.
- Instead synchronous is mentioned without introduction the performance context at the end and asynchronous is mentioned as a contradiction.
One Shot prompts are not the best approach suitable for our needs
As a one shot prompt the approach is a nice try, but there is a lot possible to improve the result by setting up an AI toolchain based on current serious experiences with this very fast evolving technology.
Topics are:
- System Prompts (Meta-Prompting)
- Adjusting temperature to zero etc.
- Existing Content as RAG
- Feature Store and Feedback Management
- Multistep Workflows (aka AI Agents)
- Quality Assurance
- This should be split into a pipe of actions (not properly ordered:
- Add a Link to the meta documentation of the Plone Documentation as requirements for new Docs for Plone 6 as a reference for the AI Meta-Prompt prerequisites
- List all the code parts involved in the Plone 5 documentation mentioned
- Compare the relevant related parts with the code in Plone 6
- Was anything changed in the underlying code except the estimated renaming from
Event HandlerstoSubscribers - Is something available from the Roadmap of Plone6/7
- is something relevant in the forum available (examples for subscribers are available) I stop here for now
- using a different AI to qa AI output
- This should be split into a pipe of actions (not properly ordered:
Make this not a blocker
On the other hand if the content is properly reviewed and corrected, one can accept the final PR.
Long term discussion on AI generated docs, based on existing docs
My concern is that we end up with fast created content and people missing the knowledge to fix the flaws. Like @stevepiercy can volunteer a lot but has to stop when the technical knowledge of the topic is missing.
For those who are interested: During Beethoven Sprint I created a meta-documentation for creation of more informal content like translated success stories using LLMs based on talk transscripts on Plone.de (it is hidden and not public). I can drop you the link to a GDocs version for discussions. Since that point in time a lot of water went down the river rhine as we say it in the AI business.
AI can be a useful tool, but ultimately content experts need to review the content for accuracy, and documentarians need to review it for broader contexts than AI can handle.
In other words, our volunteer jobs for which we are generously compensated are not threatened by AI. 😏
But seriously, there have been multiple incidents with AI being misused, primarily in Volto, once resulting in a copyright infringement issue that I escalated to the Board to manage, but also with shitty first-timer contributions that waste our time to review.
I recommended an AI policy to the Board, and I hope they publish its AI policy soon. IMO anytime AI is used in a contribution, then the contributor must disclose that fact, as @ksuess has responsibly done, giving us the prompts used to reproduce. However, I would add that the AI tool used must also be disclosed.
I am amazed at the wealth of information and opinions on this topic.
For the sake of completion I added the used tool and the sources information of the response I posted with the first commit of this PR.
Copyright: I think this is a sensitive topic. But the case in Volto is somehow an example, simple to avoid. The person lazily posted swedish translations including the authors name, which is not himself. I mean, really? The first rule is to read the response of the AI tool. The second is to check the sources. Before these checks it is lazy to post something. And it's lazy not to even mention that an AI tool has been used for the commits.
Actually a stunning good starting point, but its not correct in all places.
Thank you, @jensens, for the review. I wait with applying your suggestions, cause the spontaneous idea of this PR is more a starting point to collect some insights, experiences and opinions.
One Shot prompts are not the best approach suitable for our needs
As a one shot prompt the approach is a nice try, but there is a lot possible to improve the result by setting up an AI toolchain based on current serious experiences with this very fast evolving technology.
Topics are:
* System Prompts (Meta-Prompting) * Adjusting temperature to zero etc. * Existing Content as RAG * Feature Store and Feedback Management * Multistep Workflows (aka AI Agents) * Quality Assurance * This should be split into a pipe of actions (not properly ordered: * Add a Link to the meta documentation of the Plone Documentation as requirements for new Docs for Plone 6 as a reference for the AI Meta-Prompt prerequisites * List all the code parts involved in the Plone 5 documentation mentioned * Compare the relevant related parts with the code in Plone 6 * Was anything changed in the underlying code except the estimated renaming from `Event Handlers` to `Subscribers` * Is something available from the Roadmap of Plone6/7 * is something relevant in the forum available (examples for subscribers are available) I stop here for now * using a different AI to qa AI output
You have obviously deeper insights and experience with AI tools. How is your estimation for the time commitment needed for let's say one chapter, that is by now only a placeholder: Setting up a toolchain with constraints and requirements for Plone 6 documentation, focus (developer, not marketing), copyright, example code, etc., generating also meta data, keywords and description for SEO?
You have obviously deeper insights and experience with AI tools. How is you estimation for the time commitment needed for let's say one chapter, that is by now only a placeholder: Setting up a toolchain with constraints and requirements for Plone 6 documentation, focus (developer, not marketing), copyright, example code, etc., generating also meta data, keywords and description for SEO?
I was working on a toolchain based on PrivateGPT with RAG and custom system prompts over one year ago as a test balloon. Open source models from Mistral and later LLama 3.1 8B and 70B were used on a local MacBookPro M1 MaxPro with 64GB RAM. The 70B was at that time was at the limit of the hardware. For us the use of the public APIs was not an option.
Currently my next steps are waiting for customer requirements to update to the latest toolchains. There was very much progress in the AI world during the last months. I now would take a look at something like AnythingLLM as a frontend framework. But to create a reliable toolchain usable by a wider audience the complexity is high, when you want to constantly rely on that. This is why I am focusing to replace the workflow and result management solutions out there with actually a Plone System used similar to the eea.daviz concepts.
I am sorry to have skip the estimate question. We are also in R&D for that. I took a touch on the topic in my talk during the Plone Tagung 2024 in Gießen.
The first rule is to read the response of the AI tool. The second is to check the sources. Before these checks it is lazy to post something. And it's lazy not to even mention that an AI tool has been used for the commits.
I included these points in my recommendation to the Board about an AI policy. It touches on risks of violating copyright law and ethical concerns of being a member of the Plone Foundation or GitHub organization. I'll follow up with the Board to get the status of this item.