mail icon indicating copy to clipboard operation
mail copied to clipboard

Migrate from textProcessing to taskProcessing

Open julien-nc opened this issue 1 year ago • 20 comments

There is a new API to run the AI tasks. It is slightly different than the old one.

As Mail is using Summary, Topics and FreePrompt, it should be relatively straightforward to migrate to the taskProcessing API. More information there: https://github.com/nextcloud/assistant/issues/114

cc @hamza221 @st3iny

julien-nc avatar Aug 23 '24 10:08 julien-nc

is this a breaking change?

ChristophWurst avatar Aug 23 '24 10:08 ChristophWurst

The textProcessing API will stay one or two major NC versions. But the apps that implement providers have migrated to taskProcessing (or will do soon). So you might run out of providers. I guess that can be considered as a breaking change.

julien-nc avatar Aug 23 '24 10:08 julien-nc

I see. This is very unfortunate to announce after feature freeze and branch-off.

ChristophWurst avatar Aug 23 '24 10:08 ChristophWurst

@julien-nc what is the replacement for \OCP\TextProcessing\IManager::runTask?

ChristophWurst avatar Aug 23 '24 11:08 ChristophWurst

\OCP\TaskProcessing\IManager::scheduleTask

You can get the task ID right after having scheduled it:

$this->taskProcessingManager->scheduleTask($task);
$taskId = $task->getId();

The tasks can't run synchronously anymore because many providers may take too long and it's possible to reach the Php process timeout. Tasks are processed in background jobs (which can be fast if occ background-job:worker "OC\TaskProcessing\SynchronousBackgroundJob" is running).

The OCP\TaskProcessing\Events\TaskSuccessfulEvent and OCP\TaskProcessing\Events\TaskFailedEvent events are dispatched after the task has succeeded/failed. They contain the task.

If you want to still do something similar than \OCP\TextProcessing\IManager::runTask you can have a poll loop in the backend right after having scheduled it:

$task = $this->taskProcessingManager->getTask($taskId);
if ($task->getStatus() === Task::STATUS_SUCCESSFUL) {
    // do something with the result
}

julien-nc avatar Aug 23 '24 12:08 julien-nc

(which can be fast if occ background-job:worker "OC\TaskProcessing\SynchronousBackgroundJob" is running).

Is there a solution without this? Unfortunately I can't assume that every Nextcloud installation has this process running

ChristophWurst avatar Aug 23 '24 12:08 ChristophWurst

If this is not running, the taskProcessing jobs run when cron.php is executed.

julien-nc avatar Aug 23 '24 12:08 julien-nc

To elaborate why Mail uses the synchronous mode fully intentionally: want to process emails as late as possible when the user opens them, but then show the results right away. Background processing is extremely expensive because we have to process all emails of an IMAP account. Dispatching an async task only when the user opens a message breaks the UX because it would take a bit of time for the results to be ready.

Hope this makes sense.

ChristophWurst avatar Aug 23 '24 12:08 ChristophWurst

If this is not running, the taskProcessing jobs run when cron.php is executed.

A well configured system has cron set up for a 5m interval. Some older system still use 15m, tiny setup use irregular ajax cron. I'd say even the 5m are not acceptable for a reaction time for a thread summary in Mail

ChristophWurst avatar Aug 23 '24 12:08 ChristophWurst

If the Mail frontend sends a synchronous request to the server which blocks until the task has finished (with textProcessing or with taskProcessing), it blocks a Php runner so it can have an impact on the general server performance. Also this Php process might always get killed if it's too long and no result can ever be produced. We can't guarantee synchronous tasks will succeed as we can't predict how much time it will take the providers to process them.

That's why tasks are now always processed in bg jobs.

Running the occ bg job worker is strongly recommended to be able to run AI tasks with no delay. We had to deal with a trade off between convenience for the developers, failure potential and constraints on the admins. If Nextcloud was a persistent process which could run threads, it would be possible to have synchronous processes. I hope this makes sense as well.

julien-nc avatar Aug 23 '24 12:08 julien-nc

We target only openai for our integration (the rest is too slow), so the process is mostly IO bound when it waits for the API response. The blocked request is OK for us.

I get the general push towards async processing for tasks of unknown complexity, though.

ChristophWurst avatar Aug 23 '24 13:08 ChristophWurst

The local LLM2 is now equally fast and could be potentially used also (although we did not run tests on large texts).

DaphneMuller avatar Aug 23 '24 13:08 DaphneMuller

One more detail: the occ bg job worker only runs tasks for which the responsible provider is implemented in a Php app. The providers that are implemented in an external application are consuming tasks as soon as they are ready. They are making request to Nextcloud to get tasks that they can process. There is no delay there, even without the worker.

julien-nc avatar Aug 23 '24 13:08 julien-nc

@DaphneMuller nice! I'll still have to wait 5-15 minutes for the result when the special worker process is not running, right?

ChristophWurst avatar Aug 23 '24 13:08 ChristophWurst

I'll still have to wait 5-15 minutes for the result when the special worker process is not running, right?

Like mentioned before, not if the provider is LLM2 (which is an external application).

julien-nc avatar Aug 23 '24 13:08 julien-nc

Then I misread.

@julien-nc do you have some example code for getting synchronous-ish results from LLM2 without the use of occ background-job:worker "OC\TaskProcessing\SynchronousBackgroundJob"?

In my understanding the LLM processing would not happen until the next cron execution.

ChristophWurst avatar Aug 23 '24 13:08 ChristophWurst

With or without the worker, you can schedule a task and immediately start checking if it's finished or not (in the frontend or in the backend, as you wish). In the backend, it can be done like said before, getTask and task->getStatus. In the frontend, ocs/v2.php/taskprocessing/task/TASK_ID to get the task.

julien-nc avatar Aug 23 '24 14:08 julien-nc

We can also keep the providers for the old APIs in integration_openai and the features in Mail are not broken.

julien-nc avatar Aug 23 '24 14:08 julien-nc

That's the best solution right now because it means we can branch-off Mail for the upcoming release

ChristophWurst avatar Aug 26 '24 11:08 ChristophWurst

This is done and will be included in the next integration_openai release. https://github.com/nextcloud/integration_openai/pull/120

julien-nc avatar Aug 26 '24 11:08 julien-nc

Two things should make it more convenient:

  • The TextProcessing and SpeechToText APIs are now forward compatible with providers. New TaskProcessing providers can be used by the TextProcessing API (for FreePromptTaskType, HeadlineTaskType, SummaryTaskType and TopicsTaskType because they have exact matches in the new API) and the SpeechToText API. This means you will benefit from new providers while using the old APIs.
  • The TaskProcessing manager now has a runTask method to run a task synchronously. This should make the migration easier.

All this is in stable30 already.

julien-nc avatar Aug 30 '24 13:08 julien-nc