architecture icon indicating copy to clipboard operation
architecture copied to clipboard

Create separate ThreadPoolExecutor for core system usage

Open awarecan opened this issue 6 years ago • 6 comments

Problem

We use one ThreadPoolExecutor to arrange all sync version functions, such as Entity.update() and util.json.load_json(). The size of this pool is automatically determined.

Changed in version 3.5: If max_workers is None or not given, it will default to the number of processors on the machine, multiplied by 5 ref1

In some case, such as home-assistant/home-assistant#21925, the whole thread pool could be blocked by bad integration or upstream lib, then it will cause the whole system unusable.

Proposal

Therefore, I suggest we create a separate ThreadPoolExecutor, named core_executor, only execute the core functions, such as util.json.load_json() used in storage helper. We can have two options to do that

  1. add executor parameter in HomeAssistant.async_add_executor_job parameter

  2. add an annotation on the function we want to only be executed in core_executor

Alternative

Allow user change the max_works via core configuration or environment variable.

Rewrite async friendly util.json.load_json() and other core still-sync functions.

awarecan avatar Mar 11 '19 08:03 awarecan

So besides a bad integration taking the system down, how common is this? As far as I know, this is really the first occurrence of this problem (that we know). Maybe we should keep an eye on it before trying to solve it?

btw, can't read files inside asyncio, that's executor only. One thing that might help, and I'm considering, is doing all the registry and auth reads before setting up a single component.

balloob avatar Mar 12 '19 03:03 balloob

So bad integrations taking the system down definitely happen. I've managed to do this to my testing environment more than once while working on a component and accidentally doing I/O in a poor manner. Hopefully, they happen more in devs testing environments than production, but things can definitely slip through.

rohankapoorcom avatar Mar 12 '19 03:03 rohankapoorcom

We had a case with tellduslive where a lock in the library both ate threads and held up the event loop.

Downside of using a separate thread pool is that problems will be harder to notice. The user will notice quite fast if there's a serious problem with the thread pool in home assistant and report it. With a separate thread pool, problems will be harder to diagnose.

MartinHjelmare avatar Mar 12 '19 08:03 MartinHjelmare

Maybe have a core executor just during startup? There are very few writes after that.

balloob avatar Mar 12 '19 23:03 balloob

Reads are also done trough the SyncWorkers, and such a read is done when the Home Assistant front page is opened (for the translations).

michaelarnauts avatar Jan 15 '20 08:01 michaelarnauts

My system has gone from stable to almost unusable in the last few releases.

I strongly suspect it is related to the number of workers. A 433mhz button that fires an automation used to work within a couple of seconds. I can now sit there waiting for up to 30 seconds to a minute for it to work. Whilst in that state if I go to the front end it won't load either. I assume it is also waiting on the loop to be available?

I'm sure it will be a bad integration but there is no quick and easy way to find out which one.

So I think this is a good idea.

foxy82 avatar Jan 15 '20 08:01 foxy82

This architecture issue is old, stale, and possibly obsolete. Things changed a lot over the years. Additionally, we have been moving to discussions for these architectural discussions.

For that reason, I'm going to close this issue.

../Frenck

frenck avatar May 11 '23 13:05 frenck