Glance-Bookmarklet icon indicating copy to clipboard operation
Glance-Bookmarklet copied to clipboard

Long words disappearing

Open bminde opened this issue 10 years ago • 11 comments

Long words are not shown, it jumps right to the next word.

E.g on this page http://www.nrk.no/nordland/en-av-fire-disponert-for-narkolepsi-1.11585097 the words "forskningsartikkel", "svineinfluensaviruset", "Pandemrix-vaksinen" and "årsakssammenhengen" are not shown.

bminde avatar Mar 06 '14 13:03 bminde

spritz.js:

173        var tail = 22 - (word.length + 7);
174        word = '.......' + word + ('.'.repeat(tail));
310        String.prototype.repeat = function( num ){
311            return new Array( num + 1 ).join( this );
312        }

forskningsartikkel length is 18: 22-(18+7) = -3

Uncaught RangeError: Invalid array length (on line 311)

Max. word size is 16. I think we need to split long words. The question is where.

Any suggestions?

0xE282B0 avatar Mar 06 '14 14:03 0xE282B0

Ah! Great catch. I guess it's not going to work all that well for German and Norwegian, etc right now..

22 is an arbitrary number. We could just raise that up. Would that solve your problem?

Miserlou avatar Mar 06 '14 16:03 Miserlou

We can use soft hyphen (­) to split long words. For German and English there is for example hyphenator.js https://code.google.com/p/hyphenator/ which can be used as bookmarklet, too. But I have no solution for Norwegian or other Languages.

0xE282B0 avatar Mar 06 '14 17:03 0xE282B0

On the official spritz example, I think they actually split words in the middle, and use dashes to show parts of the word over multiple frames.

TheSavior avatar Mar 06 '14 21:03 TheSavior

This is really annoying and currently makes OpenSpritz hardly usable for German texts. Hyphenation is the way to go in my opinion.

F30 avatar Mar 10 '14 00:03 F30

@F30 , how long are the (typical) German words please?

tomByrer avatar Mar 15 '14 15:03 tomByrer

I use the hyphenator that @smielke mentions -- it exposes a hyphenateWord method. I only hyphenate words that are too long (presently by character length, but I'll be upgrading to base this on rendered width in ens.) see: https://github.com/kukulski/readifry/blob/master/main.js

kukulski avatar Mar 16 '14 05:03 kukulski

@tomByrer Hmm, hard to estimate. We do have words like „Ver­mö­gens­zu­ord­nungs­zu­stän­dig­keits­über­tra­gungs­ver­ord­nung“ [1], but such are of course rather the exception than the rule. As the graphic in [2] is 404'ing, I unfortunately couldn't find a source for the word length distribution, but the article states an average length of 10.6 characters.

I agree, however, that it would be sensible to add hyphenation for long words instead of just increasing the maximum to some arbitrary value. From my knowledge, there exist both algorithm- and word list-based approaches to hyphenation. The one @smielke mentioned seems to be algorithmic, which of course appears preferable for a JS solution. It also promises to support more languages than he mentioned [3].


[1] http://www.sprachlog.de/2013/06/05/das-neue-laengste-wort-des-deutschen/ [2] http://www.duden.de/sprachwissen/sprachratgeber/durchschnittliche-laenge-eines-deutschen-wortes [3] http://code.google.com/p/hyphenator/wiki/en_AddNewLanguage

F30 avatar Mar 16 '14 13:03 F30

average length of 10.6 characters

With a max length of 18 characters, hyphenating does sound best.

tomByrer avatar Mar 16 '14 14:03 tomByrer

Hyphenator is a good idea here. Is there a preferred method, or should we just rip out the readifry one?

From my phone.. On Mar 16, 2014 7:17 AM, "tomByrer" [email protected] wrote:

average length of 10.6 characters

With a max length of 18 characters, hyphenating does sound best.

Reply to this email directly or view it on GitHubhttps://github.com/Miserlou/OpenSpritz/issues/39#issuecomment-37758152 .

Miserlou avatar Mar 16 '14 16:03 Miserlou

I broke out my hyphenation wrapper to make it easy to pick up: https://github.com/kukulski/readifry/blob/gh-pages/HyphenHelper.js

kukulski avatar Mar 16 '14 18:03 kukulski