faker icon indicating copy to clipboard operation
faker copied to clipboard

docs(lorem): define allowed words

Open xDivisionByZerox opened this issue 1 year ago • 21 comments

Description

Document the expected words for a locale in the lorem module.

Related to

This is the first part of #2884.

  • #2884

xDivisionByZerox avatar May 08 '24 10:05 xDivisionByZerox

Deploy Preview for fakerjs ready!

Name Link
Latest commit 79e8529ed2dadceaab9b87db80fd5e95b6806676
Latest deploy log https://app.netlify.com/sites/fakerjs/deploys/665995697eeb600007b0a2fd
Deploy Preview https://deploy-preview-2885.fakerjs.dev
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify[bot] avatar May 08 '24 10:05 netlify[bot]

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.96%. Comparing base (a082ed2) to head (79e8529).

Additional details and impacted files
@@           Coverage Diff            @@
##             next    #2885    +/-   ##
========================================
  Coverage   99.95%   99.96%            
========================================
  Files        2986     2986            
  Lines      215926   215929     +3     
  Branches      598      950   +352     
========================================
+ Hits       215839   215855    +16     
+ Misses         87       74    -13     
Files Coverage Δ
src/modules/lorem/index.ts 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

codecov[bot] avatar May 08 '24 10:05 codecov[bot]

I thought about using normal words if they dont use Latin. 🤔

Oh wow, then I completly misunderstood what we discussed in the team meeting when we made the decision. Discuss again in tomorrows meeting?

xDivisionByZerox avatar May 08 '24 11:05 xDivisionByZerox

We should survey what current non-Latin locales actually do and document that (eg are they nonsense words or real words or transliterations of Latin lorem?)

matthewmayer avatar May 08 '24 11:05 matthewmayer

25 locales currently have a lorem/words.ts file. For each i generated 5 sample words

Latin

code script words(5) notes
cs_CZ Latn quasi neque quasi delectus minima standard Latin lorem
de Latn excepturi inventore nihil eveniet velit standard Latin lorem
en Latn crur capillus denique veritas audacia standard Latin lorem
fr Latn aliquid vitae accusamus suscipit est standard Latin lorem
fr_CH Latn iure ratione dicta voluptas illo standard Latin lorem
nl Latn veritatis quibusdam maxime magnam possimus standard Latin lorem
pl Latn accusamus eaque deleniti quam distinctio standard Latin lorem
pt_BR Latn vitae aliquid temporibus laudantium nam standard Latin lorem
sk Latn quidem possimus corrupti odio voluptate standard Latin lorem
tr Latn optio natus quis aspernatur molestias standard Latin lorem
uz_UZ_latin Latn tutamen ullam magni auctor delectatio standard Latin lorem
en_BORK Latn thees lebureeuoos gesh ooccoor injuy real English words with spelling modifications
lv Latn māxīmē vulnēro xīphīās soļ āēgrotātīo standard Latin lorem with extra diacritics
es_MX Latn Fichero Incorpóreo Basurear Engarbarse Gendarme random real and nonsense Spanish words
vi Latn yêu bè vàng ngọt độc random real Vietnamese words

Non-latin

(im not really enough of an expert in non-Latin languages to tell if these are nonsense or real words)

code script words(5) notes
ar Arab الذات اصرخ ليونة أتذكر فشيأ
dv Thaa އިންގިލާބެއް އެންމެ ތަރައްގީ މުޅިން ގެއްލުންނުވާ
el Grek nihil similique laudantium aliquid qui standard Latin lorem (perhaps should be changed)
fa Arab تمام به پایان بلکه ستون داشت
he Hebr דולור תוק לפתיעם רוגצה קלאצי Transliteration mimicking lorem ipsum in Hebrew
hy Armn աշխարհում բոլորն մեկ հասած իրենց
ja Jpan 色々 独裁 錠 めいがら たて
ko Kore 형에 확정될 자유를 정한다. 범하고
ru Cyrl направлений модели внедрения профессионального играет Random real words
ur Arab چاسدسد چسد ساسدبھ اسدفگبطاسدفد اسداسدھدسبابگ ابنسد

matthewmayer avatar May 09 '24 15:05 matthewmayer

Team Proposal

  • The lorem module will always return the Latin lorem words/sentences.
  • The lorem word definitions will be inlined into the module and removed from the locale data
  • We will add replacement methods to the word module, that generate sentences/paragraphs with words of that locale

ST-DDT avatar May 23 '24 16:05 ST-DDT

Would it make more sense to move the lorem definitions to the base locale?

matthewmayer avatar May 24 '24 00:05 matthewmayer

Team Proposal

  • The lorem module will always return the Latin lorem words/sentences.
  • The lorem word definitions will be inlined into the module and removed from the locale data
  • We will add replacement methods to the word module, that generate sentences/paragraphs with words of that locale

I was not available at last team meeting, and I'm missing any reasons 👀 Why will lorem always return latin?

Shinigami92 avatar May 26 '24 17:05 Shinigami92

Why will lorem always return latin?

Because all locales that use Latin characters, use Latin anyway. And those that dont, use normal words and are sometimes even incompatible with the Latin sentence structure.

ST-DDT avatar May 26 '24 17:05 ST-DDT

Team Task

Everybody should make their suggestions for the lorem module defimnitions and expectations. We will discuss the proposals/expectations in the next team meeting.

ST-DDT avatar May 30 '24 15:05 ST-DDT

Team Task

Everybody should make their suggestions for the lorem module defimnitions and expectations. We will discuss the proposals/expectations in the next team meeting.

My expectation:

If we do not at least change the behavior of lorem, there should be a e.g. word/lorem.blindtext that generates randomized but localized placeholder texts (wiki:de:Blindtext, wiki:en:Filler_text) name and module is up for discussion

Shinigami92 avatar May 30 '24 19:05 Shinigami92

@Shinigami92 Could you please elaborate on what you would consider the defining difference between the word module and the lorem module?

ST-DDT avatar May 31 '24 09:05 ST-DDT

@Shinigami92 Could you please elaborate on what you would consider the defining difference between the word module and the lorem module?

Personally I would say word is for generating words like nouns, verbs and so on, while lorem is more like for placeholder texts for example to test responsive table cells in a frontend.

However I did not read our docs yet and both modules were from times long before I came into the project and so I won’t like to be made responsible for any historical decisions. Instead I would like more to find a good way in the future and not look into the past. So I don’t care if it is called lorem, word or anything else but the functionality is provided.

Shinigami92 avatar May 31 '24 09:05 Shinigami92

I won’t like to be made responsible for any historical decisions.

That was not my intention. I'm sorry.


I spend quite some time thinking about this. The following represents my personal opinion. I'll split the answer to multiple comments in order to make it easier to react to them using emojis. I hope the reactions (and other answers) help us (all) determine where we are one the same page and where we are not.

ST-DDT avatar Jun 01 '24 07:06 ST-DDT

For me, lorem (module) is a specific type of blind text that you are not supposed to read/be able to understand.

ST-DDT avatar Jun 01 '24 07:06 ST-DDT

If we define lorem to be latin, then those locales that currently return locale specific words would be impacted. If we define lorem to be locale specific, then the locales that currently use latin would loose their original intent of creating un-understandable text. If we not define lorem, then the locales that use latin are unable to generate (pseudo-)understandable blind texts unless we add new methods for that.

ST-DDT avatar Jun 01 '24 07:06 ST-DDT

If we add the explicit concept of (pseudo-)understandable blind text, then it is likely that it will require the same or at least similar methods to the current lorem module, so that you are able to generate a string matching your length requirements. These would cause conflicts with the existing methods and would need some form of disambiguation. Either by prefixing them with blind e.g. blindText, blindSentence, or moving them to a different new module.

ST-DDT avatar Jun 01 '24 07:06 ST-DDT

I think the real question is does "lorem" mean general blind text or specifically the Latin lotem ipsum text?

Does the idea of "Chinese lorem" or "Thai lorem" or "Hebrew lorem" make sense?

matthewmayer avatar Jun 01 '24 07:06 matthewmayer

Does the idea of "Chinese lorem" or "Thai lorem" or "Hebrew lorem" make sense?

Important question. For me, Chinese lorem is just a blind text. If it doesn't use latin (like) characters including translations it is a blind text but not lorem. If it is a "phonetic translation", then it is just gibberish using the locales characters. not better than faker.string.fromCharacters. The closest non-latin character based equivalent I can think of is the japanese DoReMi spelling training thing(?), because I assume their brain turns off as soon as it recognizes the "intro". Which kinds of leads me to "we should probably start our lorem text with lorem ipsum", but that kind of is a different feature request altogether.

@matthewmayer Could you please share your opinion on any of these questions? Ultimately we need answers/shared opinions to form any kind of consensus.

ST-DDT avatar Jun 01 '24 07:06 ST-DDT

I don't really know. I think we need to try and involve some native speakers of non-Latin languages.

matthewmayer avatar Jun 01 '24 08:06 matthewmayer

Team Proposal

  • We want to make the lorem module to consist only of lorem ipsum (Latin)
  • We want to make a new text (or similar) module that generates blind texts in the current locale
    • We are not sure yet whether these will be hard-coded pseudo realistic sentences or wild combinations of words that may or may not follow the normal syntax of the language (e.g. firstName drives through city vs noun verb adjective vs word word word)
    • The module should roughly mimic the methods in the lorem module
    • The lorem module and the blind text module should have links (at each method) between each other to help with discovery
  • The actual redefinition on the lorem module is v10/not now (after we have the blind-text module)

ST-DDT avatar Jun 06 '24 16:06 ST-DDT