Reorganize predefined recognizers into logical subfolders
Reorganize predefined recognizers into logical subfolders
- Move country-specific recognizers to country_specific/{country}/ folders
- Move generic recognizers to generic/ folder
- Move NER recognizers to ner/ folder
- Maintain full backward compatibility for all existing imports
- Update test imports to use new file locations
- Add comprehensive documentation for new structure
Fixes #1638
Change Description
This PR reorganizes the predefined recognizers in presidio-analyzer into logical subfolders to improve code maintainability and make it easier for contributors to add new recognizers.
What Changed:
Directory Structure:
country_specific/- Organizes recognizers by country (9 countries: US, UK, India, Italy, Australia, Spain, Finland, Poland, Singapore)generic/- Contains globally applicable recognizers (Credit Card, Crypto, Date, Email, IBAN, IP, Medical License, Phone, URL)ner/- Contains Named Entity Recognition based recognizers (SpaCy, Stanza, Transformers, GLiNER, Azure AI Language)
Files Moved:
- 28 country-specific recognizers moved to appropriate country folders
- 10 generic recognizers moved to
generic/folder - 5 NER recognizers moved to
ner/folder
Backward Compatibility:
- All existing imports continue to work unchanged (e.g.,
from presidio_analyzer.predefined_recognizers import CreditCardRecognizer) - Main
__init__.pyupdated to import from new locations and re-export all classes - No breaking changes for existing users
Documentation:
- Added comprehensive
README.mdexplaining new structure and contribution guidelines - Updated
CHANGELOG.mdunder Unreleased section as required - Added
__init__.pyfiles to all new directories
Test Updates:
- Fixed 3 test files that had direct imports to recognizer files
- All tests now use correct import paths for new structure
Benefits:
- Better Organization: Clear separation makes codebase more maintainable
- Easier Contributions: Contributors can easily find where to add new recognizers
- Scalable: Simple to add new countries or recognizer types
- Well Documented: Clear guidelines for future development
Issue reference
This PR fixes issue #1638
Checklist
- [x] I have reviewed the contribution guidelines
- [ ] I have signed the CLA (if required)
- [x] My code includes unit tests (updated existing tests affected by reorganization)
- [ ] All unit tests and lint checks pass locally
- [x] My PR contains documentation updates / additions if required
@microsoft-github-policy-service agree
/azp run
Azure Pipelines successfully started running 1 pipeline(s).
Thanks! There are some linting errors here and there. Please check the CI output.
Hi @krikera, we left a few comments. Would you be interested in continuing the work on this? Can we help in any way?
Hi @krikera would you mind allowing me to push to your branch? I can make the changes there to update the PR.
closing to continue the work on #1670