rust-unic icon indicating copy to clipboard operation
rust-unic copied to clipboard

Initial implementation for unic-ucd-unihan

Open eyeplum opened this issue 6 years ago • 11 comments

This is a partial implementation of ucd-unihan #224 .

Changed areas

  • gen
  • New subcrate ucd/unihan

Notes

One other thing to consider is - as Unihan is a CJK centric module in the Unicode standard - maybe we could make this crate an optional subcrate of the rust-unic super crate and user needs to opt-in explicitly to use it.


This change is Reviewable

eyeplum avatar Jun 07 '18 08:06 eyeplum

Failures are unrelated; I'll fix those as soon as I get a chance this weekend.

At a glance-over, this looks good; I'll do a more detailed pass this weekend.

CAD97 avatar Jun 07 '18 17:06 CAD97

Thanks for building this, @eyeplum!

I'm in the process of moving source data files out of the repository, to be imported as submodules. (The download+unzip scripts will be in Python, I guess.)

Having the data externally, we won't have to deal with downloading/unzipping ourselves, which would be much better for this repo. Also, allowing easier addition of other sources and models.

If you like, we can rebase and try to land this work, and drop the data retrieving parts later. Or, we can just wait for the external sources and drop the data source work from this PR. What do you think?

behnam avatar Aug 20 '18 06:08 behnam

Sure, wait for the external sources sounds like a better option 👍

eyeplum avatar Aug 20 '18 07:08 eyeplum

Ping ?

LuoZijun avatar Oct 04 '18 10:10 LuoZijun

In #247, I have added the complete Unicode UCD data package, under the new address: /external/unicode/ucd/data/, which also contains a Unihan/ directory with all of its txt files. (See https://github.com/open-i18n/data-unicode-ucd/tree/master/data/Unihan)

So, there's no data package anymore, and you just need to write the gen rules for the tables, and implement the algorithm in a new component.

Also, as a reminder, since the new source data files are imported as submodules, you need to do git submodule update --init to get the files, before running /etc/generate-tables.sh.

behnam avatar Jan 06 '19 15:01 behnam

Hey @behnam , thanks for the review! I have updated the pull request to read Unihan data from ./external/unicode/ucd/data/Unihan/ and migrated related code changes to Rust 2018.

eyeplum avatar Jan 07 '19 08:01 eyeplum

Hey @behnam , I'm planning to merge this in this weekend. Although the functionalities are very limited at the moment, I imagine this might be somewhat useful for users that needs Unihan. I'm planning to map most of the Unihan tables later this year, hopefully I will have enough time to make it happen.

As of now, before I start mapping more Unihan contents, I'm thinking about tackling the Unicode 11.0 upgrade for rust-unic first. Mainly because Unicode 12.0 is coming and I think it would be nice for us to at least update to Unicode 11.0 so future updates will be more manageable. I may need some help planning the work as there seems to be a lot involved. I will probably create a new issue so we can have more detailed discussions there.

What do you think?

eyeplum avatar Mar 01 '19 06:03 eyeplum

Hey @behnam and @CAD97 , are you happy if I merge this in?

eyeplum avatar Mar 06 '19 08:03 eyeplum

Is there any news?

mozillazg avatar May 25 '19 04:05 mozillazg

Is this still being worked on?

asg0451 avatar Jul 01 '21 17:07 asg0451

@asg0451, I'm not planning to merge this anytime soon, you could try it out in my fork if you are interested in Unihan https://github.com/eyeplum/rust-unic

eyeplum avatar Jul 02 '21 08:07 eyeplum