dateparser icon indicating copy to clipboard operation
dateparser copied to clipboard

[WIP] Updating CLDR data

Open gavishpoddar opened this issue 3 years ago • 5 comments

  • Updating CLDR data to 39.0.0.
  • Fixing CLDR downlaod error.
  • Updating CLDR Data URL : https://github.com/unicode-cldr/cldr-dates-full (archived) -> https://github.com/unicode-org/cldr-json.

TODO :

  • Fixing tests

Fixes issue #940

gavishpoddar avatar Jul 03 '21 21:07 gavishpoddar

Many tests seem to be wrong just for example in tests/test_languages.py:805 for language Zulu

Current data translates son 23 umasingana 1996 to sunday 23 january 1996 but according to Google Translate it's isonto 23 Januwari 1996 for sunday 23 january 1996

Additionally languages like as are poorly translated.

This PR fixes those issues but currently, the tests are not updated.

@noviluni, please suggest should I update the tests accordingly.

A review will be helpful.

Thanks

Note: This PR breaks 39 tests.

gavishpoddar avatar Jul 03 '21 22:07 gavishpoddar

Hi @gavishpoddar, I created a "guide" to handle this (CLDR updates), but we never started doing it. It would be nice if you read it to see if you missed anything: https://github.com/scrapinghub/dateparser/issues/826

My initial idea was to update version by version, but it's OK if we update directly to the last version as you did. After that we will need to check file by file to see if we are removing things that could generate "breaking changes" (and possibly adding them to our own data), but before starting the review I would like to understand why you removed the "version".

It is really important to point to a specific version and not directly to master to easily understand which version are we pointing and to be able to update easily in the future (master could be "incomplete" or "wrong"). In the past we didn't have a way to know it, so we didn't know which version we were using and how outdated we were, so I would like you to reconsider adding again the cldr_version and the repo.git.co(cldr_version) statements. We need to keep this. If it doesn't work because they are tags instead of branches, etc. maybe you need to change the step, but as I mentioned we need to point to a specific version.

thanks! :)

noviluni avatar Jul 04 '21 07:07 noviluni

At this point, 7 tests are failing in tests/test_freshness_date_parser.py.

I am unable to fix them please help.

@noviluni

gavishpoddar avatar Jul 07 '21 11:07 gavishpoddar

@gavishpoddar the builds for this PR were not enabled (it's a newish github feature), sorry about that - just enabled them.

lopuhin avatar Jul 08 '21 11:07 lopuhin

Codecov Report

Merging #941 (4580337) into master (507dc6d) will increase coverage by 0.00%. The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #941   +/-   ##
=======================================
  Coverage   98.29%   98.29%           
=======================================
  Files         234      234           
  Lines        2694     2700    +6     
=======================================
+ Hits         2648     2654    +6     
  Misses         46       46           
Impacted Files Coverage Δ
dateparser/languages/locale.py 98.71% <100.00%> (+0.02%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 507dc6d...4580337. Read the comment docs.

codecov[bot] avatar Oct 09 '21 21:10 codecov[bot]