Pyphen icon indicating copy to clipboard operation
Pyphen copied to clipboard

Import/loading takes a long time. How to speed up loading?

Open Wikinaut opened this issue 2 years ago • 8 comments

I use pyphen for my Rasperry Pi Zero powered Internetradio https://github.com/Wikinaut/pinetradio .

Import of pyphen started.
pyphen imported, loading of de_DE took 43.77 seconds on Raspberry Pi Zero

Loading always takes a very long time. Is there a way to decrease the loading time?

Wikinaut avatar Mar 21 '23 12:03 Wikinaut

I use pyphen for my Rasperry Pi Zero powered Internetradio

Cool!

pyphen imported, loading of de_DE took 43.77 seconds on Raspberry Pi Zero

That’s a lot. Even if the Raspberry Pi’s CPU is slow, it shouldn’t take so much time. Profiling on my computer doesn’t give interesting results, could you please provide profiling information on your Raspberry? You can get profiling information launching python -c "import pyphen; pyphen.Pyphen(lang='de_DE')" -m cProfile -o /tmp/cprofile, and you can send the /tmp/cprofile file here.

(I hope I’ll be able to read it even if it’s no the same platform, otherwise I’ll ask you to launch an additional command!)

liZe avatar Mar 22 '23 08:03 liZe

Done. The command did not work, but I put the commands into a file an run that. Here ist the full output:

(available until March 2024) https://dpaste.com/7RA2RN2ES.txt

Wikinaut avatar Mar 22 '23 09:03 Wikinaut

Here's is just one example of the usage (purpose of hyphenation: allow use of maxium font size on the tiny display of https://github.com/Wikinaut/pinetradio ). The first - and third - came from the hyphenation. Currently, I use only no or one automatic hyphenation per word. grafik

Wikinaut avatar Mar 22 '23 09:03 Wikinaut

You may get slightly better results using the 0.14.0 version, as it may be a bit faster if your storage is slow (and it probably is). That could help with the 17 seconds spent mainly to list dictionaries, and the 30 seconds in the __init__ function code when a dictionary is parsed.

But except from this change, you have almost the same distribution of time than me. It could be possible to find optimizations, but nothing’s obvious for me now.

liZe avatar Mar 22 '23 13:03 liZe

0.14.0 is not much better:

Import of pyphen started.
pyphen imported, loading of de_DE took 39.62 seconds on Raspberry Pi Zero

new profile:

https://dpaste.com/CG5F52TKZ.txt

Wikinaut avatar Mar 22 '23 17:03 Wikinaut

0.14.0 is not much better

A 10% improvement is good news, that’s what I was hoping for, but it’s not enough.

new profile:

It looks like we saved some time listing dictionaries, importing the module seems to be much faster.

For ~50s, there’s:

It should be possible to save some time, but there’s nothing obvious from what I see there :/.

liZe avatar Mar 23 '23 14:03 liZe

Perhaps an offline-preprocessing of the (used) dictionary? Could this help?

Wikinaut avatar Mar 23 '23 15:03 Wikinaut

Perhaps an offline-preprocessing of the (used) dictionary? Could this help?

I’ve tried to load a JSON (generated from the dictionary) on my laptop and it’s ~5 times faster than loading the Hunspell dictionary. With Pickle, it’s ~10 times faster. The benefits would be probably higher on slower systems.

We could consider including these pre-processed dictionaries. Pickle and JSON are probably not the best solutions (for different reasons), good ideas are welcome 😁.

liZe avatar Mar 25 '23 17:03 liZe