multiplier icon indicating copy to clipboard operation
multiplier copied to clipboard

Create a database diagnostic tool

Open pgoodman opened this issue 2 years ago • 2 comments

The tool should look for hints of various possible issues for further investigation. For example:

  • [ ] Fragments sharing the same file_token_id.
  • [ ] Fragments sharing the same hash.
  • [ ] Repeats of the same symbol name in the mangled_name, where the data is first split by spaces, and the first element is taken (symbol name). Then looking that they are all in a reference chain.

pgoodman avatar Jan 08 '23 02:01 pgoodman

Get the base name of a mangled name:

CREATE VIEW IF NOT EXISTS base_mangled_name AS
         SELECT m.entity_id AS entity_id,
                substr(m.data||' ', 0, instr(m.data||' ',' ')) AS data
         FROM mangled_name AS m

Get all entity IDs sharing the same base name of a mangled name:

SELECT data, GROUP_CONCAT(entity_id,',') AS entity_ids FROM base_mangled_name GROUP BY data;

pgoodman avatar Jan 08 '23 05:01 pgoodman

Find fragments sharing the same file_token_id, and also having the same length of serialized data. If two fragments are at the same location and have identical (compressed) serialized sizes then this is a strong indicator of redundancy.

SELECT file_token_id,
       GROUP_CONCAT(fragment_id, ',') AS fragment_ids
       GROUP_CONCAT(data_length, ',') AS data_lengths
FROM (SELECT f.fragment_id,
             length(f.data) AS data_length,
             h.file_token_id FROM fragment AS f
      JOIN fragment_hash_0 AS h
      ON (f.fragment_id & 16777215) = h.fragment_index)
GROUP BY file_token_id;

pgoodman avatar Jan 08 '23 05:01 pgoodman