flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

WIP: job-list: store inactive job data in job database

Open chu11 opened this issue 2 years ago • 13 comments

Per discussion in #4273, some notes on what this does:

  • add an sqlite database that stores all inactive data to it.

  • db table is:

const char *sql_create_table = "CREATE TABLE if not exists jobs("
                               "  id CHAR(16) PRIMARY KEY,"
                               "  t_inactive REAL,"
                               "  jobdata JSON,"
                               "  eventlog TEXT,"
                               "  jobspec JSON,"
                               "  R JSON"

Where jobdata is a json blob containing all job data (the all job-list attribute from #4324, but added a states_mask which is used internally)

The main reason we have a t_inactive column is b/c this is backwards compatible to sqlite versions that don't have json support.

  • database archiving is only enabled if a dbpath is specified in a job-list toml config or if statedir is configured. So in other words, if no db is configured, it behaves just like before.

Ideas:

  • should the jobdata have a version embedded? if things like job states bitmask or things like that change?

chu11 avatar May 19 '22 21:05 chu11

I'm sorry if this isn't related to this PR but I figured I would double check. Is the plan to also adjust the job-archive DB to match the schema you have proposed in the PR description? I only ask because I currently use this query to fetch data from the job-archive DB:

SELECT userid,id,t_submit,t_run,t_inactive,ranks,R,jobspec FROM jobs

If so, that's totally fine - I will just need to tweak the query slightly to get the data I need.

cmoussa1 avatar May 19 '22 21:05 cmoussa1

I'm sorry if this isn't related to this PR but I figured I would double check. Is the plan to also adjust the job-archive DB to match the schema you have proposed in the PR description?

The plan is that we would leave the current job-archive the way it is until flux-accounting can adjust to the new database, everything gets deployed, etc. then we can eventually retire job-archive.

chu11 avatar May 19 '22 21:05 chu11

Sounds good! Thanks for clarifying.

cmoussa1 avatar May 19 '22 21:05 cmoussa1

oh crap, i accidentally did git push origin :issue4273_job_db, thus deleting remotely deleting the branch :P (i guess I missed that middle click for the left hand side). re-opening

chu11 avatar May 19 '22 23:05 chu11

Codecov Report

Merging #4336 (ef200e1) into master (8134e0c) will increase coverage by 0.15%. The diff coverage is 70.53%.

:exclamation: Current head ef200e1 differs from pull request most recent head b46a315. Consider uploading reports for the commit b46a315 to get more accurate results

@@            Coverage Diff             @@
##           master    #4336      +/-   ##
==========================================
+ Coverage   83.37%   83.53%   +0.15%     
==========================================
  Files         401      392       -9     
  Lines       67517    65936    -1581     
==========================================
- Hits        56291    55077    -1214     
+ Misses      11226    10859     -367     
Impacted Files Coverage Δ
src/modules/job-list/stats.c 82.14% <ø> (+7.14%) :arrow_up:
src/modules/job-list/util.c 0.00% <0.00%> (ø)
src/modules/job-list/job_util.c 87.07% <54.54%> (-5.79%) :arrow_down:
src/modules/job-list/job_db.c 65.16% <65.16%> (ø)
src/modules/job-list/list.c 71.55% <73.14%> (-0.08%) :arrow_down:
src/modules/job-list/job_state.c 72.70% <82.92%> (-0.83%) :arrow_down:
src/modules/job-list/job_data.c 92.10% <92.10%> (ø)
src/modules/job-list/job-list.c 80.72% <100.00%> (+0.97%) :arrow_up:
src/common/libsubprocess/local.c 78.72% <0.00%> (-5.67%) :arrow_down:
src/modules/job-info/allow.c 71.42% <0.00%> (-5.24%) :arrow_down:
... and 117 more

codecov[bot] avatar Aug 10 '22 18:08 codecov[bot]

just re-based and and re-pushed fixing up merge conflicts due to recent changes

  • adjust for changes for job-list.list-inactive removal from #4513
  • adjust tests for flux job list needing to be on tty from #4499
  • added some additional tests for flux job list --since

chu11 avatar Sep 01 '22 19:09 chu11

rebased - fixing conflicts from #4542

chu11 avatar Sep 03 '22 15:09 chu11

repushed, rebased on PR #4575

chu11 avatar Sep 16 '22 16:09 chu11

rebased & re-pushed given #4575 going in

chu11 avatar Sep 20 '22 20:09 chu11

rebased / refactored / cleanup based on recent cleanup PRs in job-list (#4639, #4644)

chu11 avatar Oct 07 '22 18:10 chu11

Recently the topic of storing inactive and possibly purged jobs in a database and then being able to retrieve them later on via flux-jobs was brought up. Some of that work was done in this PR. This PR was initially submitted nearly a year ago, so I think it was just forgotten. So hopefully this message restarts conversation on it :-)

Admittedly needs to be re-looked at given possible new needs / requirements / changes in flux-core / etc.

See additional conversation given #4914

chu11 avatar Apr 14 '23 20:04 chu11

making this WIP, just wanted to point out a few points mostly from #4914

  • an updated version of this PR this should go in after the constraint support and some additional features are added from #4914, would hate to have a sqlite database table that we need to re-do

  • there is concern of creating a database table that is ultimately one we don't like and have to re-do several times. I do see two longer out solutions:

    • use a nosql database, but one that supports sql queries than can be converted from jsonlogic
    • use sqlite with json support, but dumping all data into one json blob in a field. By doing this we sort of get a "sql table doesn't matter" effect (i.e. if there is only 1 column, we don't care about making a bad table), although the devil is in the details until prototyping is done
      • bad part, json support in sqlite isn't supported in most distro's versions of sqlite right now
  • several of the database points I mention above are why I decided adding "constraints" support to job-list would be a better "medium term" solution than going with a everything in job-list in a database approach. right choice? we will see.

  • Sooo I'm going to keep this PR open for the time being, make it WIP, could be something we still want to merge medium term if we just deem the pros outweigh the cons.

chu11 avatar Apr 28 '23 17:04 chu11