serving
serving copied to clipboard
Refresh aspired servables/versions following config update
Currently when the configured model list is updated via a call to handleReloadConfigRequest, the request thread blocks until any newly added models become available.
Their availability however depends on the filesystem polling thread rescanning the filesystem at some periodic interval, meaning that there's an arbitrary delay before the requested changes actually take effect and the RPC returns.
This problem may not be very noticeable with the default polling interval of 1 second, but seems undesirable for longer intervals and in particular makes API-based dynamic reconfiguration incompatible with the --file_system_poll_wait_seconds=0 setting (in this case all handleReloadConfigRequest calls time-out and do not take effect).
Fixes #1519
I have opened this against 1.15 since that's the version we are using, but can rebase on a different branch if needed.
Also apologies in advance for the code, I am not very familiar with C++.
Thanks @christisg, I've pushed a commit to address your logging comment. I will aim to add unit test coverage when I get a chance... it will take me a bit longer due to unfamiliarity with C++ and the codebase/test framework.
@njhill thanks for reporting this bug. It's painful and took me more than 3 hrs to figure out REAL behavior when setting --file_system_poll_wait_seconds=0 to mitigate GCS bucket class A/B operation request calls in polling. Hope we can see your fixes soon. :)
@njhill do you want wrap this PR by adding unit-test as requested by the reviewer?
thanks!
@netfs apologies for letting this lag. I am not sure when I will realistically have a chance to do this since I'm especially busy right now and not very familiar with C++ or the testing setup so it would take me a decent chunk of time to do.
Any help with that part would be appreciated!
no worries @njhill.
@astleychen do you want to help here and add tests?