theme-test-data icon indicating copy to clipboard operation
theme-test-data copied to clipboard

International content

Open mor10 opened this issue 7 years ago • 33 comments

After conversations with various theme developers who work with other languages than English, it occurred to me the Theme Unit Test Data could benefit greatly from having proper international content. This would allow testing of things like long words (Finnish, Icelandic, Inuktitut, etc), non-english letters in their natural context (eg æ, ø, å, germanic ß, various accents, etc), RTL languages, and the likes.

Right now, the only way to test a theme for support of non-english letters is through the character map, but this doesn't provide a good testing scenario. Likewise, RTL testing can be done using a plugin, but it'll be applied to English content, not actual RTL languages.

Having a couple of posts in representative languages - Finnish, German, French, Spanish, Simplified Chinese, Farsi, Hebrew, etc - would make it easier for theme developers to test their themes not only against English, but other real language examples.

mor10 avatar Jul 12 '16 20:07 mor10

We already have that, under: https://github.com/WPTRT/theme-unit-test/blob/master/themeunittestdata.wordpress.xml#L1097;)

`Հայերեն

Lorem Ipsum-ը տպագրության և տպագրական արդյունաբերության համար նախատեսված մոդելային տեքստ է: Սկսած 1500-ականներիցLorem Ipsum-ը հանդիսացել է տպագրական արդյունաբերության ստանդարտ մոդելային տեքստ, ինչը մի անհայտ տպագրիչի կողմից տարբեր տառատեսակների օրինակների գիրք ստեղծելու ջանքերի արդյունք է: Այս տեքստը ոչ միայն կարողացել է գոյատևել հինգ դարաշրջան, այլև ներառվել է էլեկտրոնային տպագրության մեջ մնալով էապես անփոփոխ: Այն հայտնի է դարձել 1960-ականներին Lorem Ipsum բովանդակող Letraset էջերի թողարկման արդյունքում, իսկ ավելի ուշ համակարգչային տպագրության այնպիսի ծրագրերի թողարկման հետևանքով, ինչպիսին է Aldus PageMaker-ը, որը ներառում է Lorem Ipsum-ի տարատեսակներ:

Български

Lorem Ipsum е елементарен примерен текст, използван в печатарската и типографската индустрия. Lorem Ipsum е индустриален стандарт от около 1500 година, когато неизвестен печатар взема няколко печатарски букви и ги разбърква, за да напечата с тях книга с примерни шрифтове. Този начин не само е оцелял повече от 5 века, но е навлязъл и в публикуването на електронни издания като е запазен почти без промяна. Популяризиран е през 60те години на 20ти век със издаването на Letraset листи, съдържащи Lorem Ipsum пасажи, популярен е и в наши дни във софтуер за печатни издания като Aldus PageMaker, който включва различни версии на Lorem Ipsum.

Català

Lorem Ipsum és un text de farciment usat per la indústria de la tipografia i la impremta. Lorem Ipsum ha estat el text estàndard de la indústria des de l’any 1500, quan un impressor desconegut va fer servir una galerada de text i la va mesclar per crear un llibre de mostres tipogràfiques. No només ha sobreviscut cinc segles, sinó que ha fet el salt cap a la creació de tipus de lletra electrònics, romanent essencialment sense canvis. Es va popularitzar l’any 1960 amb el llançament de fulls Letraset que contenien passatges de Lorem Ipsum, i més recentment amb programari d’autoedició com Aldus Pagemaker que inclou versions de Lorem Ipsum.

Hrvatski

Lorem Ipsum je jednostavno probni tekst koji se koristi u tiskarskoj i slovoslagarskoj industriji. Lorem Ipsum postoji kao industrijski standard još od 16-og stoljeća, kada je nepoznati tiskar uzeo tiskarsku galiju slova i posložio ih da bi napravio knjigu s uzorkom tiska. Taj je tekst ne samo preživio pet stoljeća, već se i vinuo u svijet elektronskog slovoslagarstva, ostajući u suštini nepromijenjen. Postao je popularan tijekom 1960-ih s pojavom Letraset listova s odlomcima Lorem Ipsum-a, a u skorije vrijeme sa software-om za stolno izdavaštvo kao što je Aldus PageMaker koji također sadrži varijante Lorem Ipsum-a.

Česky

Lorem Ipsum je demonstrativní výplňový text používaný v tiskařském a knihařském průmyslu. Lorem Ipsum je považováno za standard v této oblasti už od začátku 16. století, kdy dnes neznámý tiskař vzal kusy textu a na jejich základě vytvořil speciální vzorovou knihu. Jeho odkaz nevydržel pouze pět století, on přežil i nástup elektronické sazby v podstatě beze změny. Nejvíce popularizováno bylo Lorem Ipsum v šedesátých letech 20. století, kdy byly vydávány speciální vzorníky s jeho pasážemi a později pak díky počítačovým DTP programům jako Aldus PageMaker.

Româna

Lorem Ipsum este pur şi simplu o machetă pentru text a industriei tipografice. Lorem Ipsum a fost macheta standard a industriei încă din secolul al XVI-lea, când un tipograf anonim a luat o planşetă de litere şi le-a amestecat pentru a crea o carte demonstrativă pentru literele respective. Nu doar că a supravieţuit timp de cinci secole, dar şi a facut saltul în tipografia electronică practic neschimbată. A fost popularizată în anii ’60 odată cu ieşirea colilor Letraset care conţineau pasaje Lorem Ipsum, iar mai recent, prin programele de publicare pentru calculator, ca Aldus PageMaker care includeau versiuni de Lorem Ipsum.

Српски

Lorem Ipsum је једноставно модел текста који се користи у штампарској и словослагачкој индустрији. Lorem ipsum је био стандард за модел текста још од 1500. године, када је непознати штампар узео кутију са словима и сложио их како би направио узорак књиге. Не само што је овај модел опстао пет векова, него је чак почео да се користи и у електронским медијима, непроменивши се. Популаризован је шездесетих година двадесетог века заједно са листовима летерсета који су садржали Lorem Ipsum пасусе, а данас са софтверским пакетом за прелом као што је Aldus PageMaker који је садржао Lorem Ipsum верзије.`

emiluzelac avatar Jul 12 '16 20:07 emiluzelac

Those are mostly slavic languages. No Scandinavian, no asian, no central European. There's a lot of stuff missing.

Ideally the unit test would have separate posts for each language with representative titles etc. Rather than having a post with a made up "extralongtitlejusttoseewhatthatwouldlooklike", you could use an actual Finnish or Welsh title that contains real words that are absurdly long.

By breaking them into individual posts, people can test their themes against specific scenarios and languages. It's possible this would work better as a separate "International" Theme Unit Test.

mor10 avatar Jul 12 '16 20:07 mor10

Also, while Lorem Ipsum is easy, it is a poor substitute for modern languages.

mor10 avatar Jul 12 '16 20:07 mor10

Yeah, we prefer not to go Scandinavian, LOL just kidding. I am all for more languages and to replace lorem with something more meaningful.

On Tue, Jul 12, 2016 at 3:17 PM, Morten Rand-Hendriksen < [email protected]> wrote:

Also, while Lorem Ipsum is easy, it is a poor substitute for modern languages.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WPTRT/theme-unit-test/issues/6#issuecomment-232166806, or mute the thread https://github.com/notifications/unsubscribe/ABs6zt13uykrzlrNgOQiamvgw_uBYAwVks5qU_ZkgaJpZM4JKxIA .

emiluzelac avatar Jul 12 '16 21:07 emiluzelac

Only reason I'm mentioning scandinavian languages is we have 29 character alphabets and long component words. Icelandic has even more letters, and Finnish has outlandishly long words.

mor10 avatar Jul 12 '16 21:07 mor10

Cool, if all agree, let's make it happen and I was joking before :)

On Tue, Jul 12, 2016 at 4:57 PM, Morten Rand-Hendriksen < [email protected]> wrote:

Only reason I'm mentioning scandinavian languages is we have 29 character alphabets and long component words. Icelandic has even more letters, and Finnish has outlandishly long words.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WPTRT/theme-unit-test/issues/6#issuecomment-232194366, or mute the thread https://github.com/notifications/unsubscribe/ABs6zqcKdLKssu1l0_wenYlqWs-cYTHFks5qVA3BgaJpZM4JKxIA .

emiluzelac avatar Jul 12 '16 22:07 emiluzelac

Definitely agree. I've been running tests for these on my own, but it'd be great to get some real examples from these other languages.

justintadlock avatar Jul 12 '16 23:07 justintadlock

Looping in @samikeijonen

mor10 avatar Jul 12 '16 23:07 mor10

and @glueckpress

mor10 avatar Jul 12 '16 23:07 mor10

Oh totally agree! I can help with Finnish. I'll get back to this when I'm back on my computer.

samikeijonen avatar Jul 13 '16 06:07 samikeijonen

Have pinged the Japanese Community on this :)

Kenshino avatar Jul 13 '16 07:07 Kenshino

For now, Japanese community use this file. https://raw.githubusercontent.com/jawordpressorg/theme-test-data-ja/master/wordpress-theme-test-date-ja.xml

hideokamoto avatar Jul 13 '16 08:07 hideokamoto

Cool guys, let's do few more: @benlumia007, @sakinshrestha 🎱

emiluzelac avatar Jul 13 '16 18:07 emiluzelac

One post per language? How much do we write?

carolinan avatar Jul 13 '16 18:07 carolinan

I can write post in Finnish with real words.

  1. Title should have long word, and probably using scandinavian letters. We use ä, ö, and å.
  2. Content should have also long words, and scandinavian letters.

Something else I'm missing?

P.S. For "solution" we tend to use small font size for headings on smaller screens, and this basic CSS.

body,
button,
input,
select,
textarea {
    word-wrap: break-word;
}

samikeijonen avatar Jul 13 '16 18:07 samikeijonen

Then we probably don't need both Swedish and Finnish.

carolinan avatar Jul 13 '16 18:07 carolinan

I can try working with Chinese (Simplified and Traditional)

benlumia007 avatar Jul 13 '16 18:07 benlumia007

@carolinan: Do you have very long words as we do in Finnish? But yeah, we probably don't need both if I include ä, ö, and å.

samikeijonen avatar Jul 13 '16 18:07 samikeijonen

That would be awesome. How about if someone @justintadlock :) starts this in English and we will translate.

I'll do Serbian (Cyrillic) and Croatian (Latin), cool?

On Wed, Jul 13, 2016 at 1:37 PM, Sami Keijonen [email protected] wrote:

I can write post in Finnish with real words.

  1. Title should have long word, and probably using scandinavian letters. We use ä, ö, and å.
  2. Content should have also long words, and scandinavian letters.

Something else I'm missing?

P.S. For "solution" we tend to use small font size for headings on smaller screens, and this basic CSS.

body, button, input, select, textarea { word-wrap: break-word; }

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WPTRT/theme-unit-test/issues/6#issuecomment-232447497, or mute the thread https://github.com/notifications/unsubscribe/ABs6zvQvaZ9gWiCgeqYXvVOFd5PLnt_Rks5qVTBygaJpZM4JKxIA .

emiluzelac avatar Jul 13 '16 18:07 emiluzelac

my grammar sucks so not me :)

benlumia007 avatar Jul 13 '16 18:07 benlumia007

Yes, if you have English version then I can translate that in Nepali Language

sakinshrestha avatar Jul 13 '16 18:07 sakinshrestha

@emiluzelac: I don't think post in Finnish should be translated from English. Because point of post should be that it's something unique to this language. And using words that would likely never be used in English.

Or did I just misunderstood you?

samikeijonen avatar Jul 13 '16 18:07 samikeijonen

I had no idea, cool, Finish can have it's own. I was thinking that we keep same content that's all :)

On Wed, Jul 13, 2016 at 1:49 PM, Sami Keijonen [email protected] wrote:

@emiluzelac https://github.com/emiluzelac: I don't think post in Finnish should be translated from English. Because point of post should be that it's something unique to this language. And using words that would likely never be used in English.

Or did I just misunderstood you?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WPTRT/theme-unit-test/issues/6#issuecomment-232450693, or mute the thread https://github.com/notifications/unsubscribe/ABs6zpJsAsf7OA3sSgjKiKrkCUM3lKw4ks5qVTMwgaJpZM4JKxIA .

emiluzelac avatar Jul 13 '16 18:07 emiluzelac

yeah same content

benlumia007 avatar Jul 13 '16 18:07 benlumia007

Locale stats https://wordpress.org/about/stats/

We probably have a few that can do german.

carolinan avatar Jul 13 '16 18:07 carolinan

For example if in English you have long word, in Finnish it could be really short. And vice versa there is lot longer words in Finnish than in English. And as I said I need to include scandinavian letters anyways so I don't mind writing different content.

samikeijonen avatar Jul 13 '16 18:07 samikeijonen

Super cool @samikeijonen. Yep @carolinan, we can ask @grappler :)

emiluzelac avatar Jul 13 '16 18:07 emiluzelac

Oh for the Spanish: @jocastaneda FTW!

emiluzelac avatar Jul 13 '16 19:07 emiluzelac

Norwegian and Danish use æ, ø, and å. Icelandic has some extra interesting letters that stem from old Norse: ð and þ. These are stress cases, which is why they matter.

Norwegian and Danish have a tendency of concatenating words like this: Næringsmiddeloppbevaringsgjenstander.

In Icelandic, a word in actual use today is Alþjóðaflutningaverkamannasambandsins.

The Wikipedia page on long words is quite interesting: https://en.wikipedia.org/wiki/Longest_words

mor10 avatar Jul 13 '16 19:07 mor10

I'm definitely gonna use epäjärjestelmällistyttämättömyydellänsäkäänköhän :)

samikeijonen avatar Jul 13 '16 19:07 samikeijonen