laf icon indicating copy to clipboard operation
laf copied to clipboard

New base::task features

Open martincapello opened this issue 10 months ago • 13 comments

This PR introduces a couple of features to base::task:

  • Possibility to set a new "done" callback that is called when the task completed its execution (we can change the name to completed instead of done if it makes more sense)
  • New "enqueued" status, which is true when the task has been started but it is waiting in the thread pool's queue until the actual time to execute the function comes.
  • New "try_skip" method, which enables the caller to remove tasks from the queue that wasn't started yet.

This was introduced to be able to do some things in https://github.com/aseprite/aseprite/pull/5037. Like being able to redraw the "Running Scripts" window as needed (when a task is started/done) and to avoid showing the "stop" button for enqueued tasks. Because we can't dequeue tasks (this should be something to add to base::task and base::thread_pool as well I think)

Something to think about: should we use an atomic enum for the base::task state instead of 3 different atomic bools? I believe that a task should be only in one of the following states at any given time:

  • Ready (task created but not started yet)
  • Enqueued (task enqueued in thread pool's queue but not selected for execution yet)
  • Running (task selected for execution and it is being executed)
  • Completed (task finished execution by either success, cancellation or error)

martincapello avatar Feb 28 '25 13:02 martincapello

Here some points:

  1. If the done callback throws an exception the whole program might crash
  2. Why not just doing the the "done" work from the "execute" callback?
  3. Probably we could create a list of callbacks? a kind of .then().then().then() etc. I'm not sure if it makes sense but might be useful to automatically check the "cancelled" status before continuing to the next then() callback (and might improve doing tasks in the pool, but I'm not sure about this one).
  4. The state could be changed to a atomic enum

dacap avatar Mar 03 '25 19:03 dacap

Here some points:

  1. If the done callback throws an exception the whole program might crash

You are rigth, try catch is needed.

  1. Why not just doing the the "done" work from the "execute" callback?

If a task is canceled before entering the work queue, the "done" work would never be called because the execute callback would not be executed. It might make more sense to rename it to "finished" or "completed" instead of "done".

  1. Probably we could create a list of callbacks? a kind of .then().then().then() etc. I'm not sure if it makes sense but might be useful to automatically check the "cancelled" status before continuing to the next then() callback (and might improve doing tasks in the pool, but I'm not sure about this one).

I'm not sure about this either. However I would like to add an on_exception callback, to call it when an exception happens in another callback.

  1. The state could be changed to a atomic enum

Great, will do it!

martincapello avatar Mar 04 '25 19:03 martincapello

I've added a new function "try_skip" in d32eb3f60bc2bb780f817798e0da2383d052613e, this function would be useful to remove work from the queue that wasn't started yet.

martincapello avatar Mar 06 '25 12:03 martincapello

To be sincere I'm not a big fan about making the base::task class more complex, I'd prefer to keep it as simple as possible (as it's the base of all tasks). It looks like now it requires some kind of documentation to know when each callback will be called (e.g. "finished" is called even when we cancel the task, or if an exception is called, etc.). The current implementation is straightforward: just one callback.

Probably we can fix some bugs to make the impl even more simple, e.g. about:

If a task is canceled before entering the work queue, the "done" work would never be called because the execute callback would not be executed.

We might change the current in_worker_thread to always call the execution:

void task::in_worker_thread()
{
  try {
    //if (!m_token.canceled()) // removing this line to always call the execution callback
    m_execute(m_token);
  }
  ...

About the try_skip it can be useful, we could call it try_pop() (like the event queue) or try_unschedule(). "Skip" sounds like it will be added/re-executed later or is unprioritized (the task is moved to the end of the queue), not sure if you agree with this.

The thread_pool::work is nice to make it easier to pop/cancel a enqueued task. If I remember correctly there is no need to "friend class" a subclass (so I guess the "friend class thread_pool" can be removed).

About all the other features/callbacks (finished, exception, done) I think are not necessary as they could be moved to another wrapper class/abstraction, e.g. the same RunScriptTask.

dacap avatar Mar 10 '25 12:03 dacap

Probably we can fix some bugs to make the impl even more simple, e.g. about:

If a task is canceled before entering the work queue, the "done" work would never be called because the execute callback would not be executed.

We might change the current in_worker_thread to always call the execution:

void task::in_worker_thread()
{
  try {
    //if (!m_token.canceled()) // removing this line to always call the execution callback
    m_execute(m_token);
  }
  ...

I'm fine with this, I was just trying to keep the current behavior as much as possible by adding optional new behavior.

About the try_skip it can be useful, we could call it try_pop() (like the event queue) or try_unschedule(). "Skip" sounds like it will be added/re-executed later or is unprioritized (the task is moved to the end of the queue), not sure if you agree with this.

I'm happy to change it to any other name, so try_pop() is fine by me.

The thread_pool::work is nice to make it easier to pop/cancel a enqueued task. If I remember correctly there is no need to "friend class" a subclass (so I guess the "friend class thread_pool" can be removed).

If I remove the friendship, then I got this error: [build] /Users/martin/igarastudio/git/aseprite-beta/laf/base/thread_pool.cpp:97:42: error: 'm_func' is a private member of 'base::thread_pool::work'. I think that the "work" class can access the private members of "thread_pool" (because of nesting), but not the other way around?

martincapello avatar Mar 10 '25 14:03 martincapello

Okay, trying the new changes now there is a caveat.

This is our new task::in_worker_thread:

void task::in_worker_thread()
{
  m_state = state::RUNNING;
  try {
    m_execute(m_token);    
  }
  catch (const std::exception& ex) {
    LOG(FATAL, "Exception running task: %s\n", ex.what());
  }

  m_state = state::FINISHED;
}

In my RunScriptTask class, I set the task::m_execute member to something like this:

[this, func](base::task_token& token) {
  try {
    func(m_L);
  }
  catch (const std::exception& ex) {
    // TODO: do something 
  }
  m_finished(token);
}

Now base::task::m_state is not FINISHED when my m_finished callback gets called. And I was actually removing the task when finished, but now it throws the ASSERT(m_state != state::RUNNING); in base::task::~task(). I've tried removing the assert and it works fine again...but we will agree that that is a pretty dirty trick. So, I don't see a way to have a finished callback with the task in FINISHED state...any suggestion?

martincapello avatar Mar 10 '25 18:03 martincapello

So, I don't see a way to have a finished callback with the task in FINISHED state...any suggestion?

Correct me if I'm wrong, but it looks like the finished state is a problem of your own task, but not a general state required by base::task. E.g. you might solve this with something like this (in your own execute lambda function):

[](base::task_token& token)
{
  try {
    ..."execute" code that checks the token and can be canceled...
  }
  catch (...) { ... }
  ..."finished" code that doesn't check the token and knows that task is finished...
}

I don't like to add more complexity to this base::task class when it's not required (e.g. a special state that is not used in any other task or in the thread pool/etc.), so I think the here we are adding something new for a specific task that can be solved more directly in your own task callback.

dacap avatar Mar 26 '25 14:03 dacap

Correct me if I'm wrong, but it looks like the finished state is a problem of your own task, but not a general state required by base::task.

Only in debug mode it is being a problem. Because there is an ASSERT in the base::task's destructor that checks the task's state.

E.g. you might solve this with something like this (in your own execute lambda function):

[](base::task_token& token)
{
  try {
    ..."execute" code that checks the token and can be canceled...
  }
  catch (...) { ... }
  ..."finished" code that doesn't check the token and knows that task is finished...
}

This is almost exactly what I've described in my last comment.

I don't like to add more complexity to this base::task class when it's not required (e.g. a special state that is not used in any other task or in the thread pool/etc.), so I think the here we are adding something new for a specific task that can be solved more directly in your own task callback.

That's fine. The only thing I'm trying to do is to avoid that ASSERT I'm mentioning. This is how the scenario looks:

  • RunScriptTask has a base::task member.
  • The Engine has a vector of unique pointers to RunScriptTask (named m_tasks).
  • RunScriptTask sets the base::task::m_execute callback of its base::task member to:
[this, func](base::task_token& token) {
  try {
    func(m_L);
  }
  catch (const std::exception& ex) {
    // TODO: do something 
  }
  m_finished(token);
}
  • Now, when the RunScriptTask::m_finished callback gets called, it erases this RunScriptTask from the Engine's m_tasks vector. And here is where the issue arises:
    • The RunScriptTask destructor is called, which in turn tries to destroy its base::task member. And then this makes the following assert fail:
task::~task()
{
  // The task must not be running when we are destroying it.
  ASSERT(m_state != state::RUNNING);
}

So, I'm wondering if we should remove that assert or do something else.

martincapello avatar Mar 26 '25 17:03 martincapello

This is my issue summarized using your example code:

[](base::task_token& token)
{
  try {
    ..."execute" code that checks the token and can be canceled...
  }
  catch (...) { ... }
  // Here the base::task doesn't know yet that we've finished, so if it is destroyed (as I'm doing), 
  // the assert in its destructor fails
  ..."finished" code that doesn't check the token and knows that task is finished... 
}

martincapello avatar Mar 26 '25 17:03 martincapello

I'd prefer to avoid deleting the task inside a callback function itself (the token/task will become an invalid pointer). Generally in other base::task usages we monitor the completed state of the task and do something (from the UI thread, e.g. app::TaskWidget), but the task is part of the widget. There is other case (ExportSpriteSheetWindow task) where we cancel the token and busy wait the task to end (ExportSpriteSheetWindow::waitGenTaskAndDelete()).

If you think that we should allow this, removing the ASSERT() is not enough, and we'll require the finished/completed callback. Because we cannot delete the task/token/this pointer from the m_execute callback (as we'll change its state to completed then). So the issue is not just the assert, but a way to destroy the task pointer from a callback itself (we'll need a good comment about this in in_worker_thread()).

Another thing: why try_skip() is required as we can cancel the task canceling the token with base::task_token::cancel().

dacap avatar Mar 26 '25 18:03 dacap

I'd prefer to avoid deleting the task inside a callback function itself (the token/task will become an invalid pointer). Generally in other base::task usages we monitor the completed state of the task and do something (from the UI thread, e.g. app::TaskWidget), but the task is part of the widget. There is other case (ExportSpriteSheetWindow task) where we cancel the token and busy wait the task to end (ExportSpriteSheetWindow::waitGenTaskAndDelete()).

Exactly this is what I wanted to avoid with the new finished callback I was proposing. Because monitoring if the task has completed is actually more complex than having a callback to just let the client/caller know that the task finished.

If you think that we should allow this, removing the ASSERT() is not enough, and we'll require the finished/completed callback. Because we cannot delete the task/token/this pointer from the m_execute callback (as we'll change its state to completed then). So the issue is not just the assert, but a way to destroy the task pointer from a callback itself (we'll need a good comment about this in in_worker_thread()).

Well yes, that is what I was trying to explain, I want to delete the task when finished, but since inside the execute callback it is not in FINISHED state yet, we need the callback. As I didn't want to do some monitoring stuff just for that, the introduction of the finished callback seemed to me a reasonable addition. If adding a good comment in in_worker_thread() will make the finished callback a reality, then I can give a try and try to put that comment as good as I can (and we can improve it if necessary).

Another thing: why try_skip() is required as we can cancel the task canceling the token with base::task_token::cancel().

If the task is queued and you cancel the token it will be canceled only when the task is taken by some thread of the pool, but there was no way to cancel tasks waiting on the queue (btw try_skip is try_pop now as you suggested, just didn't push the changes yet because I was waiting for a decision about the finished callback).

martincapello avatar Mar 27 '25 13:03 martincapello

Exactly this is what I wanted to avoid with the new finished callback

We'll be deleting the task inside a callback anyway.

Well yes, that is what I was trying to explain

"I want to delete the task inside the callback" would have saved a lot of time.

If adding a good comment in in_worker_thread()

Not only the comment, the finished callback cannot receive the token as it doesn't make sense to cancel the task there. And the impl must make clear that deleting the task in the finished callback is allowed (so the task and the token are not going to be valid anymore and the code must be prepared for that).

If the task is queued and you cancel the token it will be canceled only when the task is taken by some thread of the pool

I know, anyway the task will be almost immediately queued (the thread_pool is almost always free in our code). If you find it useful, keep the function.

dacap avatar Mar 27 '25 15:03 dacap

We'll be deleting the task inside a callback anyway.

Yeah, I fully agree. In a callback that is executed after logical finalization of the task.

Not only the comment, the finished callback cannot receive the token as it doesn't make sense to cancel the task there. And the impl must make clear that deleting the task in the finished callback is allowed (so the task and the token are not going to be valid anymore and the code must be prepared for that).

Certainly it is not useful for canceling the task there, nor modifying its data. But IMO the token is still useful in the finished callback because it holds information about how much progress was made by the task and how the task finished (by cancelation or success). What I do believe is that maybe we should pass a copy of the task_token to the finished callback to avoid any issue once the task is destroyed from within the callback.

I know, anyway the task will be almost immediately queued (the thread_pool is almost always free in our code). If you find it useful, keep the function.

In my use case I wanted to let the user stop scripts that are waiting in the thread pool's queue because the pool got exhausted by other scripts running infinite loops.

martincapello avatar Mar 27 '25 17:03 martincapello

Pushed a rebased version of this PR, and also introduced the try_skip() renamed as try_pop(). This version still have the finished callback, but with the following differences since the previous implementation:

  • Introduces a new finfunc_t type, which is basically the same as func_t but receives a const reference to a task_token instead of a non-const reference. This type is not absolutely necessary, but it is nice to restrict the task_token access in the finished callback to only the const methods.
  • Right before calling the finished callback, a copy of the task's task_token is passed to the callback.
  • I've added more comments to clarify the usage of the finished callback.

martincapello avatar Mar 31 '25 18:03 martincapello

I'll merge this as it is, I don't see the purpose of the copy of the token nor the finished callback, but I'll merge it anyway.

dacap avatar Apr 01 '25 11:04 dacap

I'll merge this as it is, I don't see the purpose of the copy of the token nor the finished callback, but I'll merge it anyway.

If you don't see the purpose I would prefer to remove this changes then. I won't be mad, I thought you saw that it was useful at the end. I do see an advantage on being able to destroy a task when finished without the need of polling, but that is just my point of view.

Also, I feel that the multiple scripts execution draft won't make it, because it needs an amount of changes that I don't think we are willing to make, in which case this changes won't be used as well. So, feel free to remove any change you don't think is useful.

martincapello avatar Apr 01 '25 13:04 martincapello