phobos icon indicating copy to clipboard operation
phobos copied to clipboard

Unicode 14.0.0 support (tables only)

Open DmitryOlshansky opened this issue 4 years ago • 24 comments

Let's see what's missing and then decide on the new home for the code generator.

The tables were generated by running ./run.d from: https://github.com/DmitryOlshansky/gen-uni-dlang

DmitryOlshansky avatar May 06 '20 00:05 DmitryOlshansky

Thanks for your pull request, @DmitryOlshansky!

Bugzilla references

Auto-close Bugzilla Severity Description
16416 enhancement Phobos std.uni out of date (should be updated to latest Unicode standard)

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#7469"

dlang-bot avatar May 06 '20 00:05 dlang-bot

Oh it's just 42k LoC of diff, LGTM! (/s)

CI says no:

std/uni/package.d(10621): Error: undefined identifier `toLowerSimpleIndexTrieEntries`
std/uni/package.d(10622): Error: undefined identifier `toTitleSimpleIndexTrieEntries`
std/uni/package.d(10620): Error: undefined identifier `toUpperSimpleIndexTrieEntries`

Geod24 avatar May 06 '20 03:05 Geod24

I think it would be nice if we could have a link to your gen.d file inside the source code as well, like at the top of the module in a comment explaining that this file is auto generated. Maybe in std.internal.unicode_tables where there is already a header where this could fit or maybe just in all of them a short note could be added

WebFreak001 avatar May 06 '20 06:05 WebFreak001

CI says no:

Yeah, and thankfully it’s simple casefolding stuff that’s missing I thought it was something more complicated.

Will get back to it today.

DmitryOlshansky avatar May 06 '20 07:05 DmitryOlshansky

I think it would be nice if we could have a link to your gen.d file inside the source code as well, like at the top of the module in a comment

I think it shall go inside of Phobos repo honestly, as all of the build helpers do. And we should generate the tables IMHO, dropping them from git and add the source *.txt data files from Unicode instead.

DmitryOlshansky avatar May 06 '20 07:05 DmitryOlshansky

The bugzilla issue is: https://issues.dlang.org/show_bug.cgi?id=16416

Robert-M-Muench avatar May 06 '20 07:05 Robert-M-Muench

Not that I'm an expert, but out of curiosity why does the code pass "auto-tester" tests for "Darwin_64_32" but not for the other configurations? Is there a machine dependence in Unicode?

Robert-M-Muench avatar May 10 '20 15:05 Robert-M-Muench

Not that I'm an expert, but out of curiosity why does the code pass "auto-tester" tests for "Darwin_64_32" but not for the other configurations? Is there a machine dependence in Unicode?

As stated in the log file:

echo "Darwin_64_32_disabled"
Darwin_64_32_disabled

MoonlightSentinel avatar May 10 '20 17:05 MoonlightSentinel

Okay, almost there. Will do a few finishing touches and it'll be ready to go.

Speaking of code generator - it'll be tricky to integrate into the Makefile at this point as it needs both 32 and 64 bit tables and by default these use machine word as the last level item. The way out I think is one of two:

  1. Just use 32-bit uint as the last level table item, it shouldn't matter much in practice.
  2. Fix things up a bit to be able to use uint or ulong as the last level table item type explicitly.

Whatever it is I think it's out of scope and is not required to get this merged. I'll put the link to the repo in the header of every file but I think it should be transffered to DLang org first.

DmitryOlshansky avatar May 11 '20 10:05 DmitryOlshansky

it'll be tricky to integrate into the Makefile

It'd be preferable to utilise the dmd build.d build scripts of possible.

Also note that the doctester currently fails for all PRs and is unrelated. it is unfortunately required.

thewilsonator avatar May 11 '20 10:05 thewilsonator

Did we switch to build.d script? Coool. I've come out of a fridge after time-traveling for a little while ;)

DmitryOlshansky avatar May 11 '20 10:05 DmitryOlshansky

Spoiler is : D is 2nd most popular language for system-level stuff in 2030. I did not tell anyone that, just a thought )

DmitryOlshansky avatar May 11 '20 10:05 DmitryOlshansky

stated in the log file:

echo "Darwin_64_32_disabled"
Darwin_64_32_disabled

Ah, ok... makes it a bit hard to understand/asses the results, if there are tests listed, which fail because they can't run anyway.

Robert-M-Muench avatar May 12 '20 19:05 Robert-M-Muench

So, any idea how to move forward with this PR?

Robert-M-Muench avatar May 25 '20 09:05 Robert-M-Muench

@DmitryOlshansky I cross-checked with Atila how to get this PR into Phobos.

Since this PR is quite extensive and requires a lot of know-how to be reviewed, which Atila doesn't have, we agreed on, that if the PR passes the tests, it will be merged.

I'm confused by the results of the checks and if this needs further action or not. Does anyone know or can find out?

@atilaneves The PR states that merging can be done automatically if it's approved. Not sure if all the tests need to pass first.

Robert-M-Muench avatar May 25 '20 17:05 Robert-M-Muench

Not sure if all the tests need to pass first.

They do.

thewilsonator avatar May 26 '20 01:05 thewilsonator

@thewilsonator Can you enlighten me on how you recognize this? I see 6 failing and 2 successful checks. Where two marked "require" failed.

Robert-M-Muench avatar May 26 '20 10:05 Robert-M-Muench

The ones marked "required" must pass. Also any breakage on buildkite, where community projects are tested, should also be dealt with. I presume the azure failures will get fixed as part of the rest.

thewilsonator avatar May 26 '20 12:05 thewilsonator

To clarify: in general all CIs must pass, but some have the tendency to regularly break due to a various reasons (e.g. buildkite when someone updates a package), hence they aren't marked as required so that in such an event or spurious failure a reviewer can still merge the PR. So the required CIs are just the ones which generally don't have spurious failures.

wilzbach avatar May 26 '20 13:05 wilzbach

Ok, thanks.

The auto-tester seems to run every day not only on-demand after a change. I don't understand why sometimes a Linux round is in the list and it's missing on newer rounds... All very confusing or at least not straight forward to understand.

And I understand that only the Darwin_64_64 case fails with this error:

std/uni/package.d(6709): Error: static assert:  "No unicode set by name Currency_Symbol was found."
std/uni/package.d(757):        instantiated from here: `opDispatch!"Currency_Symbol"`

@DmitryOlshansky Is there a file (for a table) missing? It would be great if you could have a look so this PR is not hanging around for too long.

Robert-M-Muench avatar May 31 '20 12:05 Robert-M-Muench

Okay, I’ve been out for a while, will fix missing aliases shortly.

DmitryOlshansky avatar Jul 12 '20 14:07 DmitryOlshansky

@DmitryOlshansky What is the status of this PR?

RazvanN7 avatar Apr 21 '21 08:04 RazvanN7

@DmitryOlshansky What is the status of this PR?

@RazvanN7 I'm trying to revive it. I think the codegen is fixed, now what's left is unittests depending on particular sets of characters that probably got extended.

DmitryOlshansky avatar Oct 21 '21 18:10 DmitryOlshansky

Ping @DmitryOlshansky

dkorpel avatar Mar 09 '22 16:03 dkorpel

Taken over by Rikki https://github.com/dlang/phobos/pull/8617

DmitryOlshansky avatar Nov 22 '22 18:11 DmitryOlshansky