fix(week6): Add comprehensive moderation filtering to resolve fine-tuning eval errors
Summary
Fixes critical issue where OpenAI fine-tuning jobs fail during post-training safety evaluations (refusals_v3) due to sensitive content in product descriptions.
Problem
When running the Week 6 Day 5 fine-tuning exercise, users encounter this error:
Error while running moderation eval refusals_v3 for snapshot
ft:gpt-4o-mini-2024-07-18:personal:pricer:CQxxxx
Error while running eval for category hate/threatening
Root Cause: The Amazon product dataset contains items with sensitive keywords (weapon, knife, tactical, combat, etc.) that trigger OpenAI's post-training safety checks.
Solution
1. Updated Notebook (day5.ipynb)
Added check_moderation() function that:
- Implements two-stage filtering (keyword pre-filter + OpenAI Moderation API)
- Provides detailed reporting of flagged items
- Returns clean items ready for fine-tuning
2. Standalone Scripts
-
fix_moderation.py: Batch filtering script with 25+ sensitive keywords -
test_moderation.py: JSONL verification utility
3. Documentation
-
DAY5_MODERATION_FIX_README.md: PR-focused documentation -
MODERATION_FIX_README.md: Technical deep-dive
Results
Before Fix
- Training: 200 examples
- Validation: 50 examples
- Status: ❌ Failed during post-training moderation
After Fix
- Training: 190 examples (10 filtered)
- Validation: 48 examples (2 filtered)
- Status: ✅ Successfully completed and deployed
Testing
Verified with successful fine-tuning job:
- Job ID:
ftjob-moQGns3ajsS5UWIxxxxx - Model:
ft:gpt-4o-mini-2024-07-18:personal:pricer:CQUNxxxx - Confirmed via OpenAI completion email
Benefits
- Prevents users from wasting time/money on failed fine-tuning jobs
- Demonstrates best practices for OpenAI safety evaluations
- Provides reusable tools for content filtering
- No breaking changes to original notebook structure
Files Changed
- ✏️
week6/day5.ipynb- Added moderation function - ✨
week6/fix_moderation.py- New filtering script - ✨
week6/test_moderation.py- New verification utility - 📚
week6/DAY5_MODERATION_FIX_README.md- New documentation - 📚
week6/MODERATION_FIX_README.md- New technical docs
Compatibility
- ✅ Python 3.8+
- ✅ OpenAI Python SDK v1.0+
- ✅ No breaking changes
- ✅ Works with W&B integration
Community Contribution - Week 6 Day 5 Moderation Fix by @bilallamal07
Oh gosh - would you be OK to move this to community-contributions folder? I'm grateful to have this change, and I will make this update to the main repo at some point, but in the meantime it's best not to affect the main repo where possible..
Oh gosh - would you be OK to move this to community-contributions folder? I'm grateful to have this change, and I will make this update to the main repo at some point, but in the meantime it's best not to affect the main repo where possible..
Hi, Ed Thanks for pointing this out. The PR submitted outside the community contribution process was unintentional. I’ll ensure all future updates align with the community contribution guidelines moving forward.
I appreciate your support and guidance! Best regards,