[RFC] mp.input: avoiding race conditions arising from multiple input requests
This PR builds on the changes proposed in #17251, I am submitting this now for comments.
The mp.input utilities are extremely useful, however the library currently does not provide the tools to safely use the various methods without creating code that is vulnerable to race conditions. Most of these race conditions are relatively uncommon, requiring multiple input requests to be made in close succession to each other. However, as the library becomes more popular and developers start (ab)using it more, these conditions become more likely to occur. This PR is just to point out where these data races currently occur and to present and discuss some possible solutions. @guidocella keen to hear your thoughts, especially.
I have identified three parts of the code where race conditions can occur:
- When creating new input requests shortly after terminating an old one. This is caused by the use of a single event handler per script. I discussed this issue and presented a solution in #17251.
- When using
input.terminate(). - When sending log messages.
Edit: 4. When responding to autocomplete events.
input.terminate()
Problem
According to the discussions I've had and the examples I have seen, the primary suggested usecase for input.terminate() is to close the input request that a client is using once it no longer needs any further input. However, input.terminate() does not specify which input it closes, it just closes whatever input is open. This means that if a second client creates a new input request at approximately the same time that the first client calls terminate on their input, the terminate request from the first client may arrive after the second client has opened their input prompt, immediately terminating it. This can be rather easily tested with the following two scripts:
local input = require 'mp.input'
mp.add_key_binding('KP9', nil, function()
input.get({
prompt = 'prompt1',
keep_open = true,
submit = function()
mp.commandv('script-message', 'test-run')
local i = 0
while (i < 100000) do
i = i+1
end
input.terminate()
end,
closed = function() print('closed prompt1') end
})
end)
local input = require 'mp.input'
mp.register_script_message('test-run', function()
input.get({
prompt = 'prompt2',
edited = print,
closed = function() print('closed prompt2') end
})
end)
In the above examples, whether the second prompt ends up being displayed to the user depends on the length of the while loop, a race condition.
Obviously this is a very contrived example, but there are scenarios where this is viable. For example, the first script could easily be a user configured input.select menu of arbitrary commands (e.g., a more modern version of this), which a user has configured to trigger another input dialog.
Solution
An easy way to mitigate this issue is to take some queues from mp.command_native_async() and mp.abort_async_command() and use a unique id value for terminations. Take the below example:
local id, timer
id = input.get({
prompt = "Open File:\n> ",
history_path = "~~state/open-file-history",
opened = function()
timer = mp.add_timeout(20, function()
input.terminate(id)
end)
end,
edited = function()
timer:kill()
timer:resume()
end,
submit = function(path)
mp.commandv("loadfile", path)
end
})
We use an opaque id value returned by input.get or input.select and pass it into input.terminate(). The above example uses this to implement a timeout on the request. Normally, this would require significantly more guard rails to prevent input.terminate() from being called after this input has been closed, but with ids it's trivial.
I propose we add these return values for input.get and input.select and optionally allow them to be passed into input.terminate(), and update the documentation to propose the above approach as best practice. That way, no existing scripts break (a nil argument to input.terminate() still clears everything), but future scripts that follow best practice avoid causing race conditions. It's also fully backwards compatible; if a script using these ids were run on an older version of mpv, it would be akin to passing a nil value into input.terminate(), which means things will just work as they do now.
This PR already does these things.
Logs
Problem
The problem with the logs is basically the same as with terminate; the logs operate on whatever input happens to be open even if it's not what the client might expect. This is trivially easy to test:
mp.add_timeout(0, function()
while(true) do
print('hey!!!!!!')
end
end)
mp.add_key_binding('Ctrl+Home', nil, function()
input.get({
prompt = 'prompt1'
})
end)
Launch mpv, open commands.lua to see the spam in the console output, then use Ctrl+Home to replace commands.lua with our second prompt. You should see that several of the hey!!!!! spam makes it into the logs of the new request.
Solution
I imagine that the way to avoid this issue would be to explicitly specify the log id of logs rather than rely on the correct logger being active in console.lua. I haven't made any changes to this currently, as the log format seems a little more strictly defined and I wasn't sure about the best way to add the id information (Hence the RFC).
Download the artifacts for this pull request:
Windows
- mpv-v0.41.0-dev-g04f080764-pr17256-21206610614-aarch64-pc-windows-msvc
- mpv-v0.41.0-dev-g04f080764-pr17256-21206610614-aarch64-pc-windows-msvc-pdb
- mpv-v0.41.0-dev-g04f080764-pr17256-21206610614-i686-w64-mingw32
- mpv-v0.41.0-dev-g04f080764-pr17256-21206610614-x86_64-pc-windows-msvc
- mpv-v0.41.0-dev-g04f080764-pr17256-21206610614-x86_64-pc-windows-msvc-pdb
- mpv-v0.41.0-dev-g04f080764-pr17256-21206610614-x86_64-w64-mingw32
Obviously this is a very contrived example, but there are scenarios where this is viable. For example, the first script could easily be a user configured
input.selectmenu of arbitrary commands (e.g., a more modern version of this), which a user has configured to trigger another input dialog.
This is exactly what select/menu does. It calls input.terminate() only when the chosen command does not open a new menu.
If you just don't set keep_open you don't need to call input.terminate and get the flicker on menu change. But with these changes wouldn't keep_open + unconditionally calling input.terminate(id) on submit cause the same flicker? Is there an advantage over just not setting keep_open?
The proposed auto-closing after 20 seconds example is also trivially fixed by killing the timer in closed. So I'm still hesitant to add complexity to all present and future mp.input implementations if we don't find practical advantages.
I also think log_error was a mistake by the way. It's too situational.
Also if we do want this why not just save the last id within input.lua and have input.terminate pass it automatically? You can only have one input session active at a time anyway. It is not like mp.command_native_async() where multiple commands can run at the same time.
If you just don't set keep_open you don't need to call input.terminate and get the flicker on menu change. But with these changes wouldn't keep_open + unconditionally calling input.terminate(id) on submit cause the same flicker? Is there an advantage over just not setting keep_open?
Scripts written before keep_open still exist and are in use, and some scripts may want to call terminate anyway so that they still work as expected on older versions of mpv. However, this discussion is missing the point; what my examples are trying to demonstrate is twofold:
- That
input.terminate()is inherently vulnerable to race conditions. There is no guarantee that the input won't be replaced before the terminate request is received by console.lua, which means that any timeinput.terminate()is used there is always the risk of closing the wrong input. - That there exist permitted ways of calling the
mp.inputAPI that will almost guarantee thatinput.terminate()closes the wrong input. A silent error that provides no feedback to developers about what went wrong.
The fact that the current solution involves removing the input.terminate() call is, I think, quite telling. The current version of input.terminate() is just inherently incapable of being used completely safely, there is always the risk of closing an unrelated input prompt by mistake every time you use it, and developers need to understand all these points to avoid consistently causing this to happen when creating any script that involves nested input requests.
Now we could just document this, something like:
input.terminate()Closes the currently open input request, regardless of which script created it. As this method is executed asynchronously, it is possible to accidentally close the wrong input, so care should be taken when calling this method shortly after creating a new input request or triggering another script to do so.
However, I think we can do better than this and provide an API that allows using input.terminate() safely.
The proposed auto-closing after 20 seconds example is also trivially fixed by killing the timer in closed. So I'm still hesitant to add complexity to all present and future mp.input implementations if we don't find practical advantages.
You're right, I overstated the guardrails required for the timer, not sure what I was thinking there. However, this suggestion is still vulnerable to race conditions; the timeout can go off after a new request has replaced the current one, but before the closed event has been received, hence terminating the new input instead of the old one.
Also if we do want this why not just save the last id within input.lua and have input.terminate pass it automatically? You can only have one input session active at a time anyway. It is not like mp.command_native_async() where multiple commands can run at the same time.
This would indeed fix race conditions between scripts, but it does not fix them within the same script. There is a period of time, after a new input.get call has been made but before console.lua has received the new get-input request, where events are still being sent from console.lua to the old request handler. One of these events can trigger a callback that calls input.terminate(), and thus closes the new request instead of the old. This could be addressed by automatically closing the request handler of any existing request from the client side when a new one is made. However, to maintain the API contracts we would also need to call closed with the latest contents and cursor position of the input, which would probably require every request to listen to edited events internally (plus a new event for cursor position changes), and for a cache to be kept on the local side. This cache would also be ever so slightly behind the actual state of the input, so if a user were to type a new character at the exact same time that this termination happens, then the closed event sent to the input may be missing it. All of this is possible to do, but it's adding much more complexity than the explicit id passing.
I think there's a decent middle-ground here. We could keep the explicit ids and allow them to be passed to input.terminate(), but when calling it without an argument we pass the script name alone to console.lua. That way, console.lua will always only close an input that belongs to the script in question, removing the ability for scripts to accidentally close each others input (effectively making most new and old scripts safe for free). We then document that the ids can be used to avoid race conditions within the same script, for those writing scripts complex enough to need it. Something like this:
input.terminate([t])Closes the currently active input request, if any. This will not close requests made by other scripts.If the return value of an
input.getorinput.selectrequest is passed in as the first argument then only that specific request will be closed, preventing race conditions between multiple requests made within the same script.
Personally, I think this definition would give me much more confidence as a script author than the other definition I wrote above.
(Technically the above proposal might break any scripts that were using input.terminate() to deliberately close inputs created by other scripts, but I don't know of any doing that, I don't know a valid reason to do that, and it wasn't explicitly documented, so it's probably fine to change.)
I have added two new commits to plug a couple of other race conditions.
- Sending the script name and handler id along with
completemessages so that the wrong completion options cannot be shown for the wrong input request. This seems pretty unlikely to happen, but it can be avoided with no user-facing changes, so it seems like a no-brainer. - Sending the log buffer id along with log messages. We can automatically send the id of the last made
input.get()request, which automatically avoids data races between different scripts. However, I have also added an extraidargument to the three log methods so that data races can be avoided within the same script if multiple log buffers are in use. As a side effect, this allows log entries to be sent to specific log buffers at any time, not just when an input request for that particular log buffer is active.
Scripts written before
keep_openstill exist and are in use, and some scripts may want to call terminate anyway so that they still work as expected on older versions of mpv.
Well I was aking about the advantage over leaving keep_open at the default of false. Then you don't need to call input.terminate at all in that example. Also calling input.terminate from input.select was never needed. input.get and input.select used to have opposite defaults.
You're right, I overstated the guardrails required for the timer, not sure what I was thinking there. However, this suggestion is still vulnerable to race conditions; the timeout can go off after a new request has replaced the current one, but before the
closedevent has been received, hence terminating the new input instead of the old one.
Yeah sure though that's extremely unlikely. By the way, this is what stats.lua already does.
This would indeed fix race conditions between scripts, but it does not fix them within the same script. There is a period of time, after a new
input.getcall has been made but beforeconsole.luahas received the newget-inputrequest, where events are still being sent fromconsole.luato the old request handler. One of these events can trigger a callback that callsinput.terminate(), and thus closes the new request instead of the old. This could be addressed by automatically closing the request handler of any existing request from the client side when a new one is made. However, to maintain the API contracts we would also need to callclosedwith the latest contents and cursor position of the input, which would probably require every request to listen toeditedevents internally (plus a new event for cursor position changes), and for a cache to be kept on the local side. This cache would also be ever so slightly behind the actual state of the input, so if a user were to type a new character at the exact same time that this termination happens, then theclosedevent sent to the input may be missing it. All of this is possible to do, but it's adding much more complexity than the explicit id passing.
I still think this should be solveable without making developers pass around ids since input.lua has all the needed information. Something like
mp.register_script_message(handler_id, function (type, args) {
local latest_id_backup = latest_id
latest_id = handler_id
t[type](args)
latest_id = latest_id_backup
EDIT: also avoid restoring the id if it was changed within the callback
I still think this should be solveable without making developers pass around ids since input.lua has all the needed information. Something like
mp.register_script_message(handler_id, function (type, args) { local latest_id_backup = latest_id latest_id = handler_id t[type](args) latest_id = latest_id_backup
Hmm, I see what you mean, we could do something like this:
local function register_event_handler(t)
local handler_id = "input-event/"..handle_counter
handle_counter = handle_counter + 1
latest_handler_id = handler_id
mp.register_script_message(handler_id, function (type, args)
-- do not process events (other than closed) for an input that has been overwritten
if latest_handler_id ~= handler_id and type ~= 'closed' then
return
end
if t[type] then
local completions, completion_pos, completion_append =
t[type](unpack(utils.parse_json(args or "") or {}))
if type == "complete" and completions then
mp.commandv("script-message-to", "console", "complete",
utils.format_json(completions), completion_pos,
completion_append or "")
end
end
if type == "closed" then
mp.unregister_script_message(handler_id)
end
end)
return handler_id
end
This would avoid data races from occurring as a result of edited and submit events being received after creating a new input, though it would not guard against input.terminate() being called from something else like mp.add_timeout(), though that at least is something that developers have control over. So you'd do something like this:
global_timer = mp.add_timeout(20, input.terminate, true)
...
global_timer:kill()
input.get({
prompt = "Open File:\n> ",
history_path = "~~state/open-file-history",
opened = function()
global_timer:resume()
end,
edited = function()
global_timer:kill()
global_timer:resume()
end,
submit = function(path)
mp.commandv("loadfile", path)
end,
closed = function()
global_timer:kill()
end
})
And of course there may be a couple of events that end up getting dropped like a last millisecond submit request, but the timing would be so close that that shouldn't be an issue.
If you think this is the better option, then I can change the PR. This should also address the main cause of in-service log race conditions, meaning that the mp.input log methods wouldn't need the extra id parameter I proposed in the commit. Do you think I should remove those as well, or is the extra functionality of being able to send logs to inactive buffers worth the addition?
Well I personally think it's nicer to not add id arguments to every function. Scripts can always buffer log messages on their own and send them on opened which is what commands.lua already does through silent:terminal-default. And I don't think anybody else uses input.get this way like a REPL other than commands.lua and my lua-repl.lua script anyway.
Okay, I have force pushed a new set of commits that automatically pass console.lua log and handler ids. I believe this means that all the inherent race conditions in mp.input() are addressed with no changes to the interface (haven't done testing yet).
I believe I have addressed all of the review comments 👍.
All review comments addressed.
@guidocella, regarding a comment about processing all incoming events and passing the handler_id along with terminate. We cannot do that naively as if an input event is received after a new request is made, then there is a chance the callback for that request will trigger input.terminate() (a very decent chance if the script was written before keep_open was created), which will act on the wrong request. This also occurs for log messages; I imagine that scripts that are not commands.lua would be most likely to add something to the log when an input event is received.
I believe that we could still do this, however, by overriding the ids when processing an input event. Something like this:
local handler_id_override
local log_id_override
local function register_event_handler(t)
local handler_id = "input-event/"..handle_counter
handle_counter = handle_counter + 1
latest_handler_id = handler_id
mp.register_script_message(handler_id, function (type, args)
if type ~= "closed" then
handler_id_override = handler_id
log_id_override = not t.items and t.id or nil
end
if t[type] then
local completions, completion_pos, completion_append =
t[type](unpack(utils.parse_json(args or "") or {}))
if type == "complete" and completions then
mp.commandv("script-message-to", "console", "complete", utils.format_json({
client_name = mp.get_script_name(),
handler_id = handler_id,
list = completions,
start_pos = completion_pos,
append = completion_append or "",
}))
end
end
handler_id_override = nil
log_id_override = nil
if type == "closed" then
mp.unregister_script_message(handler_id)
end
end)
return handler_id
end
...
function input.terminate()
mp.commandv("script-message-to", "console", "disable", utils.format_json({
client_name = mp.get_script_name(),
handler_id = handler_id_override or latest_handler_id,
}))
end
So the question is if we want to marginally increase the complexity of the code in order to guarantee that no events are ever dropped.
Well yeah I thought it was strange that input.terminate works differently from other functions, but then deleted the post because I realized it would still be racy. It is safe for complete to pass the handler_id because it just gets it from the callback, but the current commits do assume that log functions won't be called within closed though that is reasonable.
Overriding the id with the callback one is almost what I proposed in https://github.com/mpv-player/mpv/pull/17256#issuecomment-3748933971, but your variation breaks this case
input.get{
submit = function () {
input.get()
input.set_log{...}
}
}
I'm fine with either the current commits or my proposal above, whichever you prefer. The chances or receiving submit or edited for a previous session are astronomically low anyway, I don't see how it would happen in practice unless I'm missing something.
On a side note, I realized script-message is a misnomer since it can send messages to non-script clients.
I think that the chance of it happening is so low that we're safe to not do so. A dropped submit or edited event also does not cause any race conditions, and I don't think it actually violates any explicit guarantees that we are making. It is also very easy to add this behaviour in the future if these assumptions are incorrect. And in the meantime the code is kept a little more intuitive, without any temporarily overridden globals.
Addressed all review comments.
Not directly related but it's probably worth implementing a libmpv equivalent of input.lua instead of documenting the API, so we can make more changes like this.
@guidocella could you resolve remaining review comments?