HackGPT
HackGPT copied to clipboard
Implement Content Moderation
OpenAI's Content Moderation endpoint is a valuable tool for preventing requests and responses that violate the content policy. In order to ensure that HackGPT users can utilize this feature effectively, we propose the addition of a content moderation option. This feature will allow users to check requests and responses against the Moderation Endpoint and take appropriate actions based on the results.
Proposed Feature
1. Content Moderation Integration
We will add an option to HackGPT that enables content moderation checks for both incoming requests and outgoing responses. Users can enable this feature by specifying a configuration flag.
2. Simple Mode vs. Advanced Mode
The content moderation feature will offer two modes of operation: Simple Mode and Advanced Mode.
Simple Mode:
In this mode, users can enable content moderation with a single on/off switch. When enabled, HackGPT will perform a standard content moderation check for all categories available in the moderation endpoint.
Advanced Mode:
Advanced Mode provides users with more flexibility. Users can select specific categories from the moderation endpoint for which they want to perform content moderation. This allows users to tailor the moderation process according to their specific needs.
3. Threshold Configuration for Advanced Mode
When Advanced Mode is active, users will have the option to set custom thresholds for the selected categories. This involves using sliders to adjust the threshold values.
For example, a user might set a lower threshold for the "sexual" category if they wish to be more cautious about potential sexual content, whereas they could set a higher threshold for "harassment" if they want to allow a certain degree of leeway in that category.
4. Handling of Moderation Results
The results from the content moderation checks will be used to determine whether the incoming requests and outgoing responses comply with OpenAI's Content Policy.
A) Blocking
In Option A, if the moderation endpoint returns "flagged: true" for any category, HackGPT will automatically prevent the response from being sent to the user. Additionally, it will provide a notification to the user indicating that the content has been blocked by the content moderation.
B) Flagging
Option B, when enabled, will allow HackGPT to perform content moderation but instead of automatically blocking the response, it will flag the content that has been identified as potentially violating OpenAI's Content Policy. A notification will be sent to the user, indicating that the content has been flagged.
Conclusion
By implementing the content moderation feature, HackGPT users can ensure that conversations are in compliance with OpenAI's Content Policy. The flexibility provided by the Simple Mode and Advanced Mode allows users to adjust the content moderation settings according to their specific requirements and preferences.