spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

[Enhancement] Improve lex_attrs for Spanish & Portuguese

Open weezymatt opened this issue 3 weeks ago • 0 comments

This PR enhances support for Spanish (es) and Portuguese (pt) in their respective spacy/lang modules by updating the lex_attrs.py files. Each change is accompanied with regression tests in their test_text.py files, respectively.

Description

Spanish (es):

  • Add feminine & apocopation ordinals

  • Add abbreviation (e.g., 1.º) and plural rule for ordinals in like_num function

  • Refactor test_issue3803 to follow spaCy code conventions by using fixtures

  • Add regression test test_es_lex_attrs_like_number

Portuguese (pt):

  • Add number variations (i.e., uma, duas)

  • Fix typo "seicentos" -> "seiscentos"

  • Add gender rules to the hundreds [200-900]

  • Add feminine ordinals

  • Add plural rule for ordinals in like_num

  • Add tests test_pt_lex_attrs_like_number and test_pt_lex_attrs_like_number_for_ordinal to more or less maintain language coverage

Additional:

  • Add weezymatt.md in ./github/contributors

Last bits:

  • Code conventions are followed using flake8 and black 25.11

Types of change

My PR covers an enhancement to the existing code.

Checklist

  • [x] I confirm that I have the right to submit this contribution under the project's MIT license.
  • [x] I ran the tests, and all new and existing tests passed.
  • [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

weezymatt avatar Dec 01 '25 22:12 weezymatt