faker icon indicating copy to clipboard operation
faker copied to clipboard

Improve specification of lorem module and definitions

Open ST-DDT opened this issue 1 year ago • 7 comments

Languages that use latin characters should use the latin lorem text (e.g. copy it from en). We will create a separate PR that will document this behavior of the lorem feature in the definitions and module. After the change of definitions we should create another PR that actually changes the locale data to comply with the new spec and additional care should be taken regarding tree shakeability if the locale data are imported from other locales.

Especially the later part should be done by the team to ensure there are no side effects.

Originally posted by @ST-DDT in https://github.com/faker-js/faker/issues/2824#issuecomment-2077678007

ST-DDT avatar May 08 '24 08:05 ST-DDT

Suggestion by @matthewmayer

If it is identical to en could we not just have words.ts be like this to avoid duplication:

import words from '../../en/lorem/words';
export default words;

ST-DDT avatar May 08 '24 09:05 ST-DDT

I'll create a PR for the documentation part. Unless you are already working on it.

xDivisionByZerox avatar May 08 '24 10:05 xDivisionByZerox

https://github.com/faker-js/faker/issues/2907#issuecomment-2119197303

This raises my impression that lorem should be strictly Latin only and we should probably add gibberishSentence/Paragraph/Text methods to the word module instead.

ST-DDT avatar May 19 '24 12:05 ST-DDT

Honestly, as long as we have a clear definition I'm fine with it. It's not like we can't change it in the future, if required.

xDivisionByZerox avatar May 19 '24 12:05 xDivisionByZerox

If we do decide to to keep lorem non-Latin languages we should definitely allow for languages that don't have a space as a word seperator (e.g. Chinese, Thai). For example here you'd probably want to use a definition like faker.definitions.lorem.word_seperator instead of a hardcoded ' '.

matthewmayer avatar May 22 '24 05:05 matthewmayer

Team Proposal

  • We want to make the lorem module to consist only of lorem ipsum (Latin)
  • We want to make a new text (or similar) module that generates blind texts in the current locale
    • We are not sure yet whether these will be hard-coded pseudo realistic sentences or wild combinations of words that may or may not follow the normal syntax of the language (e.g. firstName drives through city vs noun verb adjective vs word word word)
    • The module should roughly mimic the methods in the lorem module
    • The lorem module and the blind text module should have links (at each method) between each other to help with discovery
  • The actual redefinition on the lorem module is v10/not now (after we have the blind-text module)

ST-DDT avatar Jun 06 '24 16:06 ST-DDT

FFR: We now use it for domain words as well, so it using latin letters would help with that:

  • https://github.com/faker-js/faker/pull/3253

ST-DDT avatar Nov 08 '24 08:11 ST-DDT