Mingye Wang

Results 554 comments of Mingye Wang
trafficstars

> you would like to have a lot of clusters which are binary identical between many ZIM versions over time of the same content Exactly. (Rsync wants the same too.)...

I put two versions of zh.wikipedia all maxi onto ipfs as: ``` # --chunker=buzhash --hash=blake2b-256 bafykbzacea5d2c2lxcdbgja2vnswe5aembmwvzqh6an4wzix3wbxkbkkkevdq bafykbzaceb3mzfxjiuizmh7yoz5sfajx7zil6xhck2onrl5sr7aolq65iwmhm ``` The two sizes are 17843989064 and 17633450554. My shell script thinks 17628509273...

* The zh files should be `wikipedia_zh_all_maxi_2020-0{7,6}zim`. I think the first one is the newer file since it's larger, but I can't be sure since I effed up my terminal...

Looks like my daemon just crashed :/

Now that I checked my scropt again, my is-block-seen test is broken. Of course it is! I forgot a bit of `${}`, so instead of checking whether the variable is...

WRT writing things that compilers (usually) autovectorize, the [simd-everywhere/simde](https://github.com/simd-everywhere/simde) project might be interesting to look at. Well, mainly how it annotates the loops and uses builtins.

Note: the problematic file is Greeting.strings. ``` $ hexdump -C Greeting.strings.txt 00000000 ff fe 0a 00 2f 00 2a 00 20 00 4a 00 75 00 73 00 |..../.*. .J.u.s.|...

You might think 简 正 繁 is enough for cn tw hk, but people in sg and my also use the simplified script. Plus every sensible translation of ll_CC, including...

Again, the distinction between zh-tw and -cn and -hk -and -sg/my is not limited to the hans/hant script difference. Saying x體中文 does not encapsulate the difference encapsuled in local phrases...