BERTopic icon indicating copy to clipboard operation
BERTopic copied to clipboard

OpenAI policy false positive trigger an error

Open DamienBukudjian opened this issue 8 months ago • 1 comments

Have you searched existing issues? 🔎

  • [x] I have searched and found no existing issues

Desribe the bug

When a policy is triggered by OpenAI, bertopic crash... A solution could be to detect the 'code': 'content_filter' pair and create a "flagged" topic ?

logs:

2025-04-16 15:51:41,573 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm 2025-04-16 15:51:58,486 - BERTopic - Dimensionality - Completed ✓ 2025-04-16 15:51:58,486 - BERTopic - Cluster - Start clustering the reduced embeddings 2025-04-16 15:51:58,687 - BERTopic - Cluster - Completed ✓ 2025-04-16 15:51:58,703 - BERTopic - Representation - Fine-tuning topics using representation models. 100%|██████████| 286/286 [08:10<00:00, 1.71s/it] 3%|▎ | 8/286 [00:12<07:07, 1.54s/it]

BadRequestError Traceback (most recent call last) Cell In[27], line 18 3 topic_model = BERTopic( 4 5 # Pipeline models (...) 14 verbose=True 15 ) 17 # Train model ---> 18 topics, probs = topic_model.fit_transform(docs, embeddings) 20 # Reduce outliers with pre-calculate embeddings instead 21 new_topics = topic_model.reduce_outliers(docs, topics, probabilities=probs, strategy="embeddings", embeddings=embeddings)

File c:\Users\damien.bukudjian\AppData\Local\miniconda3\envs\orionenv\Lib\site-packages\bertopic_bertopic.py:515, in BERTopic.fit_transform(self, documents, embeddings, images, y) 511 self._save_representative_docs(custom_documents) 513 else: 514 # Extract topics by calculating c-TF-IDF, reduce topics if needed, and get representations. --> 515 self._extract_topics( 516 documents, embeddings=embeddings, verbose=self.verbose, fine_tune_representation=not self.nr_topics 517 ) 518 if self.nr_topics: 519 documents = self._reduce_topics(documents)

File c:\Users\damien.bukudjian\AppData\Local\miniconda3\envs\orionenv\Lib\site-packages\bertopic_bertopic.py:4031, in BERTopic._extract_topics(self, documents, embeddings, mappings, verbose, fine_tune_representation) ... (...) 1066 retries_taken=retries_taken, 1067 )

BadRequestError: Error code: 400 - {'error': {'message': "The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", 'type': None, 'param': 'prompt', 'code': 'content_filter', 'status': 400, 'innererror': {'code': 'ResponsibleAIPolicyViolation', 'content_filter_result': {'hate': {'filtered': False, 'severity': 'safe'}, 'jailbreak': {'filtered': True, 'detected': True}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}}}

Reproduction

No response

BERTopic Version

0.17.0

DamienBukudjian avatar Apr 16 '25 14:04 DamienBukudjian

Hmmm, I remember that there were fixes for this a while back in BERTopic but it might be that OpenAI updated their API.

A solution could be to detect the 'code': 'content_filter' pair and create a "flagged" topic ?

That makes sense. I would also think that it would be feasible to then add other filters/errors that might exist there. Or perhaps simply use a wide exception for errors and flag all related topics.

Either way, definitely agree with tackling this somehow. However, my schedule is quite packed at the moment. If you, or anyone else, would want to take this on, then that would be great.

MaartenGr avatar Apr 24 '25 11:04 MaartenGr