argilla icon indicating copy to clipboard operation
argilla copied to clipboard

[FEATURE] Use user ids instead of username on search engine indexes

Open jfcalvo opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe.

Right now we are using user's username inside search engine indexes making difficult to allow changes on username, reducing flexibility and causing possible bugs (.i.e. user annotations recovered from a old deleted user using the same username).

Describe the solution you'd like

We should instead use user ids and avoid the use username at all in our search engines indexes.

This change add some questions that we need to answer:

  • How are we gonna migrate the current indexes using username to start using ids?
  • What are the endpoint and search queries affected by this change?
  • Do we need to modify the UI and SDK to allow use usernames for certain functions and later translate those to queries using user ids instead?

jfcalvo avatar Feb 13 '24 11:02 jfcalvo

🤖

Hey @jfcalvo, great to see you back with another insightful suggestion! 🚀

Based on the context provided, it seems feasible to implement the feature you're proposing. Here's how we can address your questions:

  1. Migrating current indexes that use usernames to start using IDs: As per the current Argilla codebase, the search engine indexes do not use usernames. They use properties like "id", "inserted_at", "updated_at", "responses", "metadata", and "suggestions". Therefore, there's no need for migration as such. However, if there are other parts of the codebase where usernames are used in the indexes, you would need to reindex those documents with the user IDs instead of usernames. This can be done using the reindex API provided by Elasticsearch. Here's an example:
{
  "source": {
    "index": "old_index"
  },
  "dest": {
    "index": "new_index"
  },
  "script": {
    "source": "ctx._source.user = ctx._source.remove('username')"
  }
}

This script will remove the username field from the source document and add a user field with the same value. You can modify this script to use the user ID instead of the username.

  1. Endpoint and search queries affected by this change: The endpoints and search queries that would be affected by this change are those that currently use the username to identify the user. For example, in the test_me, test_me_as_owner, test_me_as_admin, and test_me_as_annotator functions, the user ID is used to verify the identity of the user making the request. If these functions were to use the username instead, they would need to be modified to use the user ID. Similarly, in the test_delete_user function, the user ID is used to specify which user to delete. If this function were to use the username instead, it would need to be modified to use the user ID.

  2. Modifying the UI and SDK to allow the use of usernames for certain functions and later translate those to queries using user IDs instead: This would depend on how the UI and SDK are currently implemented. If they are currently using usernames to identify users, they would need to be modified to use user IDs instead. This could involve changes to the UI to display user IDs instead of usernames, and changes to the SDK to accept user IDs as input instead of usernames. However, it's important to note that user IDs are typically not as user-friendly as usernames, so this could impact the user experience.

I hope this helps! If you have any further questions, please don't hesitate to ask.

For more information, you can refer to the following sources:

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Feb 13 '24 11:02 dosubot[bot]