meilisearch-python icon indicating copy to clipboard operation
meilisearch-python copied to clipboard

NDJSON/CSV methods to add and update documents

Open curquiza opened this issue 4 years ago • 5 comments

⚠️ This issue is generated, it means the nameing might be done differently in this package (ex: add_documents_json instead of addDocumentsJson). Keep the already existing way of naming in this package to stay idiomatic with the language and this repository.

📣 We strongly recommend doing multiple PRs to solve all the points of this issue

MeiliSearch v0.23.0 introduces two changes:

  • new valid formats to push data files, additionally to the JSON format: CSV and NDJSON formats.
  • it enforces the Content-type header for every route requiring a payload (POST and PUT routes)

Here are the expected changes to completely close the issue:

  • [x] Currently, the SDKs always send Content-Type: application/json to every request. Only the POST and PUT requests should send the Content-Type: application/json and not the DELETE and GET ones.

  • [ ] Add the following methods and 🔥 the associated tests 🔥 to ADD the documents. Depending on the format type (csv or ndjson) the SDK should send Content-Type: application/x-dnjson or Content-Type: text/csv)

    • [x] addDocumentsJson(string docs, string primaryKey)
    • [x] addDocumentsCsv(string docs, string primaryKey)
    • [ ] addDocumentsCsvInBatches(string docs, int batchSize, string primaryKey)
    • [x] addDocumentsNdjson(string docs, string primaryKey)
    • [ ] addDocumentsNdjsonInBatches(string docs, int batchSize, string primaryKey)
  • [ ] Add the following methods and 🔥 the associated tests 🔥 to UPDATE the documents. Depending on the format type (csv or ndjson) the SDK should send Content-Type: application/x-dnjson or Content-Type: text/csv)

    • [ ] updateDocumentsJson(string docs, string primaryKey)
    • [ ] updateDocumentsCsv(string docs, string primaryKey)
    • [ ] updateDocumentsCsvInBatches(string docs, int batchSize, string primaryKey)
    • [ ] updateDocumentsNdjson(string docs, string primaryKey)
    • [ ] updateDocumentsNdjsonInBatches(string docs, int batchSize, string primaryKey)

docs are the documents sent as String primaryKey is the primary key of the index batchSize is the size of the batch. Example: you can send 2000 documents in raw String in docs and ask for a batchSize of 1000, so your documents will be sent to MeiliSearch in two batches.

Example of PRs:

  • in PHP SDK: https://github.com/meilisearch/meilisearch-php/pull/235
  • in Python SDK: https://github.com/meilisearch/meilisearch-python/pull/329

Related to: https://github.com/meilisearch/integration-guides/issues/146

If this issue is partially/completely implemented, feel free to let us know.

curquiza avatar Oct 19 '21 13:10 curquiza

Closed by #329

brunoocasali avatar Apr 26 '22 17:04 brunoocasali

Sorry, my bad!

brunoocasali avatar Apr 27 '22 12:04 brunoocasali

I'd like to implement some of the remaining points.

Azanul avatar Oct 04 '22 11:10 Azanul

Hi @Azanul, Thank you for your interest, you can implement what you want 😊

alallema avatar Oct 05 '22 08:10 alallema

I'd be interested in implementing updateDocumentsJson(string docs, string primaryKey)

Ambareen09 avatar Oct 14 '22 13:10 Ambareen09

@alallema Please reopen as not all methods are completed.

Azanul avatar Oct 17 '22 16:10 Azanul