OpenAI policy false positive trigger an error
Have you searched existing issues? 🔎
- [x] I have searched and found no existing issues
Desribe the bug
When a policy is triggered by OpenAI, bertopic crash...
A solution could be to detect the 'code': 'content_filter' pair and create a "flagged" topic ?
logs:
2025-04-16 15:51:41,573 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm 2025-04-16 15:51:58,486 - BERTopic - Dimensionality - Completed ✓ 2025-04-16 15:51:58,486 - BERTopic - Cluster - Start clustering the reduced embeddings 2025-04-16 15:51:58,687 - BERTopic - Cluster - Completed ✓ 2025-04-16 15:51:58,703 - BERTopic - Representation - Fine-tuning topics using representation models. 100%|██████████| 286/286 [08:10<00:00, 1.71s/it] 3%|▎ | 8/286 [00:12<07:07, 1.54s/it]
BadRequestError Traceback (most recent call last) Cell In[27], line 18 3 topic_model = BERTopic( 4 5 # Pipeline models (...) 14 verbose=True 15 ) 17 # Train model ---> 18 topics, probs = topic_model.fit_transform(docs, embeddings) 20 # Reduce outliers with pre-calculate embeddings instead 21 new_topics = topic_model.reduce_outliers(docs, topics, probabilities=probs, strategy="embeddings", embeddings=embeddings)
File c:\Users\damien.bukudjian\AppData\Local\miniconda3\envs\orionenv\Lib\site-packages\bertopic_bertopic.py:515, in BERTopic.fit_transform(self, documents, embeddings, images, y) 511 self._save_representative_docs(custom_documents) 513 else: 514 # Extract topics by calculating c-TF-IDF, reduce topics if needed, and get representations. --> 515 self._extract_topics( 516 documents, embeddings=embeddings, verbose=self.verbose, fine_tune_representation=not self.nr_topics 517 ) 518 if self.nr_topics: 519 documents = self._reduce_topics(documents)
File c:\Users\damien.bukudjian\AppData\Local\miniconda3\envs\orionenv\Lib\site-packages\bertopic_bertopic.py:4031, in BERTopic._extract_topics(self, documents, embeddings, mappings, verbose, fine_tune_representation) ... (...) 1066 retries_taken=retries_taken, 1067 )
BadRequestError: Error code: 400 - {'error': {'message': "The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", 'type': None, 'param': 'prompt', 'code': 'content_filter', 'status': 400, 'innererror': {'code': 'ResponsibleAIPolicyViolation', 'content_filter_result': {'hate': {'filtered': False, 'severity': 'safe'}, 'jailbreak': {'filtered': True, 'detected': True}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}}}
Reproduction
No response
BERTopic Version
0.17.0
Hmmm, I remember that there were fixes for this a while back in BERTopic but it might be that OpenAI updated their API.
A solution could be to detect the 'code': 'content_filter' pair and create a "flagged" topic ?
That makes sense. I would also think that it would be feasible to then add other filters/errors that might exist there. Or perhaps simply use a wide exception for errors and flag all related topics.
Either way, definitely agree with tackling this somehow. However, my schedule is quite packed at the moment. If you, or anyone else, would want to take this on, then that would be great.