FACT_core icon indicating copy to clipboard operation
FACT_core copied to clipboard

add db view unpacking

Open jstucke opened this issue 11 months ago • 5 comments

jstucke avatar Dec 16 '24 15:12 jstucke

Codecov Report

Attention: Patch coverage is 84.84848% with 5 lines in your changes missing coverage. Please review.

Project coverage is 91.91%. Comparing base (058e49f) to head (0b7cb80). Report is 10 commits behind head on master.

Files with missing lines Patch % Lines
src/storage/graphql/hasura/init_hasura.py 72.72% 3 Missing :warning:
src/init_postgres.py 0.00% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1321      +/-   ##
==========================================
- Coverage   92.42%   91.91%   -0.52%     
==========================================
  Files         379      378       -1     
  Lines       23661    21126    -2535     
==========================================
- Hits        21869    19417    -2452     
+ Misses       1792     1709      -83     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Dec 16 '24 15:12 codecov-commenter

What is this used for?

maringuu avatar Jan 16 '25 09:01 maringuu

What is this used for?

I should probably have included a description :sweat_smile: I only talked with @dorpvom about this. The problem this PR addresses is the following: The metadata from unpacking (entropy, extracted file count and other stuff that lands in fo.processed_analysis["unpacking"]) cannot be used together with numerical comparison operators (like "greather than" or "less equal than"). The reason for this is simple: GraphQL does not support this on JSON data. This is the simplest workaround for using this data without creating more tables/etc.: Creating a "view" for this data, which is a PostgreSQL construct, that allows using the data that is stored in a JSONB column as if it was stored as a number in a regular table in queries (and also GraphQL).

Note: in PostgreSQL we can absolutely formulate such a query, only GraphQL (or more specifically the default schema created by Hasura) is the problem

What can we do with this? We can use this to e.g. run queries like "find all files with an entropy greater or equal 0.8 from which no files were unpacked".

jstucke avatar Jan 16 '25 09:01 jstucke

Shouldn't there be an alembic migration for this?

Asides, I am happy with this if you put your description in the commit message.

maringuu avatar Feb 13 '25 08:02 maringuu

Shouldn't there be an alembic migration for this?

That's a good question. Does this need a migration if it is not part of the SQLAlchemy schema? Making it part of the schema is not really necessary, since the query limitations of GraphQL on JSON data do not apply to SQLAlchemy. AFAIK, it should be possible, though.

jstucke avatar Feb 13 '25 08:02 jstucke