meilisearch icon indicating copy to clipboard operation
meilisearch copied to clipboard

Tasks stuck in `processing` state, lots of tasks

Open 0xMostafa opened this issue 1 year ago • 1 comments

Describe the bug

I've run an indexing job of ~115k docs on meilisearch , now it has ~76k successfully indexed and ~35k tasks of type documentAdditionOrUpdate (in processing state)

it has been a day since it's stuck

indexing job took ~3 hours to complete

To Reproduce

Not an error, tasks are just stuck in processing state

Expected behavior

35k tasks should be processed swiftly like the previous 76k tasks

Screenshots

Index in indexing state for 2 days

Screenshot from 2024-06-27 13-55-26

Task queue have tasks from yesterday that are still processing but finished

Screenshot from 2024-06-27 13-55-54

task details

{
  "uid": 34856,
  "indexUid": "Vector_index",
  "status": "processing",
  "type": "documentAdditionOrUpdate",
  "canceledBy": null,
  "details": {
    "receivedDocuments": 4,
    "indexedDocuments": null
  },
  "error": null,
  "duration": null,
  "enqueuedAt": "2024-06-26T06:09:45.356074878Z",
  "startedAt": "2024-06-27T11:00:45.271060018Z",
  "finishedAt": null
}

Note: `startedAt` timestamp is updated to the current timestamp, i'm not sure why

Meilisearch version: v1.8 with vectorStore enabled

0xMostafa avatar Jun 27 '24 11:06 0xMostafa

Hey @0xMostafa, is it possible that meilisearch is crashing and restarting automatically? That would make it try to process the same batch repeatedly; this can typically happen when you don't have enough RAM to run the indexing process you're trying to do. You can reduce the number of processed tasks in a single batch by using this experimental parameter.

irevoire avatar Jun 27 '24 13:06 irevoire

I have also encountered the same problem. After exiting abnormally, asynchronous tasks are still being processed and cannot be completed. Shouldn't they be processed in batches to control the number of concurrent executions image

mj520 avatar Aug 12 '24 07:08 mj520

Hello here

As @irevoire asked, do you see Meilisearch crashing and restarting?

Do you have the same issue with v1.9.0? Also, do you have the same issue without vectorStore if you tried?

Do you have a reproducible example, or an access to machine to share (in private) to help us debug this?

curquiza avatar Aug 12 '24 08:08 curquiza

when generating documents in large quantities,meilisearch exit and restart VectorStore is default in v1.9.0 run at docker can provide machines or files

mj520 avatar Aug 13 '24 02:08 mj520

when generating documents in large quantities,meilisearch exit and restart

What's the resources of your machines? (RAM CPU). It's probably a lack of RAM. You can also try with this option: --max-indexing-memory 0

Let me know if it helps

curquiza avatar Aug 13 '24 09:08 curquiza

4 cores 8g、Will restarting the task not continue if it exits abnormally during task execution? Will a restart that has already been queued become in progress and still not execute?

mj520 avatar Aug 14 '24 02:08 mj520

Will restarting the task not continue if it exits abnormally during task execution? Will a restart that has already been queued become in progress and still not execute?

I'm not sure I get your question. But when restarting, Meilisearch will try to process the last processing task again and again. So can be stuck if there is a lack of resources.

Did you try to increase the RAM? Did you try with --max-indexing-memory 0?

curquiza avatar Aug 14 '24 11:08 curquiza

There are many unexecuted tasks piled up, which should be due to insufficient memory or other reasons. The exported program exited, but after restarting, it should continue with the unexecuted tasks, right image As you said, the last task was executed, but the nearly a thousand tasks piled up earlier did not move and remained stuck. Can we improve the number of task executions to control memory and concurrency, and can we restore execution of failed or abnormal tasks

mj520 avatar Aug 15 '24 02:08 mj520

but the nearly a thousand tasks piled up earlier did not move and remained stuck

What is stuck? What are their status? Use this route: https://www.meilisearch.com/docs/reference/api/tasks#get-tasks

If Meiliseach crashes and restart, it will try to re-process the current. But if the tasks is too big for the machine resources, it will crash in loop again and again.

Did you try to increase the RAM? Did you try with --max-indexing-memory 0?

How many documents do you have? How many do you try to index in the stuck task?

curquiza avatar Aug 19 '24 14:08 curquiza

--max-indexing-memory 0 Parameters have been added、Data of approximately 100000 * 1300 tasks、finish 300+、other processing、The task queue should not be executed together with quantity control, otherwise it will never finish and crash image image

mj520 avatar Aug 20 '24 10:08 mj520

Can you try to reduce the number of tasks process at the same time: https://github.com/orgs/meilisearch/discussions/713

with --experimental-max-number-of-batched-tasks 1 -> to process tasks one by one

curquiza avatar Aug 20 '24 12:08 curquiza

Thank you, I'm looking forward to the official version of the parameter to be added to https://www.meilisearch.com/docs/learn/self_hosted/configure_meilisearch_at_launch This is a very useful tool compared to controlling memory --max-indexing-memory.

mj520 avatar Aug 21 '24 02:08 mj520

Yes sorry, I should have suggested it first! I did not realized the batch queue was too big!

I opened an issue here: https://github.com/meilisearch/documentation/issues/2958

curquiza avatar Aug 21 '24 09:08 curquiza

I close this issue since it seems to be fixed! Let me know if it's not

curquiza avatar Aug 21 '24 09:08 curquiza

I'm still facing the issue with self hosted

{
  "results": [
    {
      "uid": 124011,
      "batchUid": 9,
      "indexUid": "addresses",
      "status": "processing",
      "type": "documentAdditionOrUpdate",
      "canceledBy": null,
      "details": {
        "receivedDocuments": 9998,
        "indexedDocuments": null
      },
      "error": null,
      "duration": null,
      "enqueuedAt": "2025-06-13T22:56:30.774624591Z",
      "startedAt": "2025-06-17T11:06:04.153665591Z",
      "finishedAt": null
    },
    {
      "uid": 124010,
      "batchUid": 9,
      "indexUid": "addresses",
      "status": "processing",
      "type": "documentAdditionOrUpdate",
      "canceledBy": null,
      "details": {
        "receivedDocuments": 10000,
        "indexedDocuments": null
      },
      "error": null,
      "duration": null,
      "enqueuedAt": "2025-06-13T22:56:30.600285418Z",
      "startedAt": "2025-06-17T11:06:04.153665591Z",
      "finishedAt": null
    },
    {
      "uid": 124009,
      "batchUid": 9,
      "indexUid": "addresses",
      "status": "processing",
      "type": "documentAdditionOrUpdate",
      "canceledBy": null,
      "details": {
        "receivedDocuments": 10000,
        "indexedDocuments": null
      },
      "error": null,
      "duration": null,
      "enqueuedAt": "2025-06-13T22:56:30.435788941Z",
      "startedAt": "2025-06-17T11:06:04.153665591Z",
      "finishedAt": null
    },
    {
      "uid": 124008,
      "batchUid": 9,
      "indexUid": "addresses",
      "status": "processing",
      "type": "documentAdditionOrUpdate",
      "canceledBy": null,
      "details": {
        "receivedDocuments": 10000,
        "indexedDocuments": null
      },
      "error": null,
      "duration": null,
      "enqueuedAt": "2025-06-13T22:56:30.28044182Z",
      "startedAt": "2025-06-17T11:06:04.153665591Z",
      "finishedAt": null
    },
    {
      "uid": 124007,
      "batchUid": 9,
      "indexUid": "addresses",
      "status": "processing",
      "type": "documentAdditionOrUpdate",
      "canceledBy": null,
      "details": {
        "receivedDocuments": 10000,
        "indexedDocuments": null
      },
      "error": null,
      "duration": null,
      "enqueuedAt": "2025-06-13T22:56:30.119271745Z",
      "startedAt": "2025-06-17T11:06:04.153665591Z",
      "finishedAt": null
    },
    {
      "uid": 124006,
      "batchUid": 9,
      "indexUid": "addresses",
      "status": "processing",
      "type": "documentAdditionOrUpdate",
      "canceledBy": null,
      "details": {
        "receivedDocuments": 10000,
        "indexedDocuments": null
      },
      "error": null,
      "duration": null,
      "enqueuedAt": "2025-06-13T22:56:29.924908454Z",
      "startedAt": "2025-06-17T11:06:04.153665591Z",
      "finishedAt": null
    },
    {
      "uid": 124005,
      "batchUid": 9,
      "indexUid": "addresses",
      "status": "processing",
      "type": "documentAdditionOrUpdate",
      "canceledBy": null,
      "details": {
        "receivedDocuments": 10000,
        "indexedDocuments": null
      },
      "error": null,
      "duration": null,
      "enqueuedAt": "2025-06-13T22:56:29.748606079Z",
      "startedAt": "2025-06-17T11:06:04.153665591Z",
      "finishedAt": null
    },
    {
      "uid": 124004,
      "batchUid": 9,
      "indexUid": "addresses",
      "status": "processing",
      "type": "documentAdditionOrUpdate",
      "canceledBy": null,
      "details": {
        "receivedDocuments": 10000,
        "indexedDocuments": null
      },
      "error": null,
      "duration": null,
      "enqueuedAt": "2025-06-13T22:56:29.580923598Z",
      "startedAt": "2025-06-17T11:06:04.153665591Z",
      "finishedAt": null
    },
    {
      "uid": 124003,
      "batchUid": 9,
      "indexUid": "addresses",
      "status": "processing",
      "type": "documentAdditionOrUpdate",
      "canceledBy": null,
      "details": {
        "receivedDocuments": 10000,
        "indexedDocuments": null
      },
      "error": null,
      "duration": null,
      "enqueuedAt": "2025-06-13T22:56:29.42238129Z",
      "startedAt": "2025-06-17T11:06:04.153665591Z",
      "finishedAt": null
    },
    {
      "uid": 124002,
      "batchUid": 9,
      "indexUid": "addresses",
      "status": "processing",
      "type": "documentAdditionOrUpdate",
      "canceledBy": null,
      "details": {
        "receivedDocuments": 10000,
        "indexedDocuments": null
      },
      "error": null,
      "duration": null,
      "enqueuedAt": "2025-06-13T22:56:29.249212379Z",
      "startedAt": "2025-06-17T11:06:04.153665591Z",
      "finishedAt": null
    }
  ],
  "total": 118497,
  "limit": 10,
  "from": 124011,
  "next": 124001
}

SwapnilSoni1999 avatar Jun 17 '25 11:06 SwapnilSoni1999