argilla
argilla copied to clipboard
feat: python-rq integration using datasets reindex as proof of concept
Description
This PR include changes as a proof of concept to check how to integrate rq
background processor with Argilla.
The changes include also two new endpoints:
-
PUT /api/v1/datasets/:dataset_id/reindex
- This endpoint will return a HTTP
202 (Accepted)
status. - A background job will be enqueue to reindex the dataset.
- The response body will include the
id
of the job and itsstatus
(queued
in this case if everything was fine). - Users can use the
id
of the job to get information about what is the status of the job.
- This endpoint will return a HTTP
-
GET /api/v1/jobs/:job_id
- This endpoint is used to obtain information about one specific job (returning the
id
andstatus
). - Jobs are right now not stored on database and I'm using
rq
API to get information about its status. -
rq
is saving job information for500
seconds on Redis, so after a job is finished or failed the user has 500 seconds to get information about it.
- This endpoint is used to obtain information about one specific job (returning the
Posible improvements:
-
Define a proper Redis connection using a pool of connections and getting settings from environment variables.Redis is using a pool of connections by default and I have added a new environment variable to set the connection (ARGILLA_REDIS_URL
). -
Define a better way to store our jobs, maybe using a newWe will start with this approach of usingjobs
table on Argilla database and allowing to save results of the jobs there.rq
results stored in Redis and in the future for more complex flows we will think into adding some data if necessary to our database. -
Define aWe will userq
queue only for search engine purposes.default
queue for now. -
OnceWe already merge the PR adding the reindex cli task and now the jobs are importing it and using it.Reindexer
class code is merged from PR adding reindex cli task we can remove it from the code in this PR. - Add a
result
field toJob
schema so we can include the result of the job inside it. (Useful to know if there are errors or additional information about the process)
Things to investigate/discuss:
- How to use Redis on our docker images, specially with QuickStart images on HF.
- Alternatives to use Redis using
fakeredis
python library instead.
Type of change
(Please delete options that are not relevant. Remember to title the PR according to the type of change)
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Refactor (change restructuring the codebase without changing functionality)
- [ ] Improvement (change adding some improvement to an existing functionality)
- [ ] Documentation update
How Has This Been Tested
(Please describe the tests that you ran to verify your changes. And ideally, reference tests
)
- [ ] Test A
- [ ] Test B
Checklist
- [ ] I added relevant documentation
- [ ] follows the style guidelines of this project
- [ ] I did a self-review of my code
- [ ] I made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] I filled out the contributor form (see text above)
- [ ] I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)
The URL of the deployed environment for this PR is https://argilla-quickstart-pr-4427-ki24f765kq-no.a.run.app
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
6630d7b
) 90.13% compared to head (de3721e
) 91.21%. Report is 578 commits behind head on develop.
Additional details and impacted files
@@ Coverage Diff @@
## develop #4427 +/- ##
===========================================
+ Coverage 90.13% 91.21% +1.07%
===========================================
Files 233 351 +118
Lines 12493 19912 +7419
===========================================
+ Hits 11261 18163 +6902
- Misses 1232 1749 +517
Flag | Coverage Δ | |
---|---|---|
pytest | ? |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.