argilla icon indicating copy to clipboard operation
argilla copied to clipboard

[SPIKE] feat: refresh record `status` column using context functions

Open jfcalvo opened this issue 8 months ago • 1 comments

Description

This PR include changes to explore the possibility of refreshing the new records status column inside our Python context functions.

The following changes has been included in this PR:

  • A new distribution context has been created.
  • A new function refresh_record_status has been added to distribution context to be called when it's necessary to refresh the status column for a specific record.
  • The function refresh_record_status is called when a response is created/updated/upserted/deleted.

Changes that are not present in this PR

  • [x] Refreshing of status column when records are created/upserted in bulk including responses.
  • [x] Refresh modified records into the search engine.
    • [ ] We need to check if the current way of indexing is redundant (like indexing records and responses is redundant somehow).
  • [ ] Avoid possible concurrency problems.
  • [x] Include status into the search engine mapping.
  • [ ] Modify the search endpoint to support filtering records with status value.
  • [ ] Modify current user metrics.
  • [ ] Modify dataset progress.
  • [ ] Refresh status column when a user is deleted (only in the case we modify the user deletion to also delete their responses in the near future).

Details that still need to be explored

The isolation level of SQLite transactions is SERIALIZABLE making concurrency problems almost impossible using this database.

On the other hand using PostgreSQL the default isolation level is READ UNCOMMITTED, a weak one, and the possibility of suffering concurrency problems is there. This is something that needs to be explored, for example trying to manually find a scenario where such concurrency problem is possible and avoiding changing it to a higher isolation level for our transactions.

Possible solutions to this:

  • Change transaction to SERIALIZABLE.
  • Use lock mechanism for the row or row column (this needs some research).

Refs #5069

Type of change

(Please delete options that are not relevant. Remember to title the PR according to the type of change)

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] Refactor (change restructuring the codebase without changing functionality)
  • [ ] Improvement (change adding some improvement to an existing functionality)
  • [ ] Documentation update

How Has This Been Tested

(Please describe the tests that you ran to verify your changes. And ideally, reference tests)

  • [ ] Test A
  • [ ] Test B

Checklist

  • [ ] I added relevant documentation
  • [ ] follows the style guidelines of this project
  • [ ] I did a self-review of my code
  • [ ] I made corresponding changes to the documentation
  • [ ] My changes generate no new warnings
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I filled out the contributor form (see text above)
  • [ ] I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

jfcalvo avatar Jun 20 '24 08:06 jfcalvo