sqlite-diffable icon indicating copy to clipboard operation
sqlite-diffable copied to clipboard

Ability to round-trip binary data

Open simonw opened this issue 4 years ago • 6 comments

e.g. for the binary numbits column in the .coverage SQLite database generated by coveragepy.

Those currently end up represented like this:

[4, 1, "b'\\xfe\\xff\\xfd{\\xe0\\x02\\x10\\x00W}o\\xdb{\\xef}o\\xef\\xbd\\xf7\\x92\\xe8\\x00\\x00\\xca\\t\\xe0\\xfb\\xdf\\x07y\\xdb\\xbe\\xf3\\x97s\\xd7\\xd8\\xeb\\x06\\xd9Y\\x16A\\x17\\xe6\\x02\\x02 @\\x08\\x10\\x00\\xbcH\\xc1$@\\xf7}?\\x01\\x04 \\x00\\x00\\x00\\x00\\x04%\\x00\\x04\\x00\\x00\\x00\\x00\\x00<\\x17H\\x00\\x00\\x12 \\xe9\\xc8\\x08\\x00\\x00\\x00\\x00\\x00\\x00@\\x00\\x00\\x00\\xd4M\\xb5\\x18\\x00w\\xd7\\xdd\\xdd\\xb6m\\xba\\xa9\\xe0\\xa7\\xf3Z\\x82\\xfbN\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x00\\x00$`\\x00\\x04'"]

Once I implement the load command (#3) these will be a problem, because they won't round-trip correctly.

I need some kind of special-case syntax for storing binary values such that they can be round-tripped properly.

simonw avatar Jun 14 '20 00:06 simonw

The .metadata.json file may be the place to do this. Right now the accompanying line_bits.metadata.json for the above table looks like this:

{
    "name": "line_bits",
    "columns": [
        "file_id",
        "context_id",
        "numbits"
    ],
    "schema": "CREATE TABLE line_bits (\n    -- If recording lines, a row per context per file executed.\n    -- All of the line numbers for that file/context are in one numbits.\n    file_id integer,            -- foreign key to `file`.\n    context_id integer,         -- foreign key to `context`.\n    numbits blob,               -- see the numbits functions in coverage.numbits\n    foreign key (file_id) references file (id),\n    foreign key (context_id) references context (id),\n    unique (file_id, context_id)\n)"
}

I could use this to say "the third column is binary, so treat it as such" somehow.

simonw avatar Jun 14 '20 00:06 simonw

Maybe columns could store type information:

    "columns": [
        ["file_id", "integer"],
        ["context_id", "integer"],
        ["numbits", "blob"]
    ]

simonw avatar Jun 14 '20 00:06 simonw

Here's how sqlite3 .coverage .dump outputs this data:

INSERT INTO line_bits VALUES(1,1,X'0e');
INSERT INTO line_bits VALUES(2,1,X'5a');
INSERT INTO line_bits VALUES(3,1,X'36218410420821841042');
INSERT INTO line_bits VALUES(4,1,X'fefffd7be0021000577d6fdb7bef7d6fefbdf792e80000ca09e0fbdf0779dbbef39773d7d8eb06d959164117e602022040081000bc48c12440f77d3f010420000000000425000400000000003c174800001220e9c80800000000000040000000d44db5180077d7ddddb66dbaa9e0a7f35a82fb4e0000000000000002000000000024600004');

simonw avatar Jun 14 '20 00:06 simonw

I can accompany this with a parametrized test that covers all of the other SQLite types as well.

simonw avatar Jun 14 '20 18:06 simonw

Hello @simonw -- I love this project, thanks for making it happen. Is this issue essentially tracking the attempt to properly make a load command, specially to handle dump-and-load of binary data stored in sqlite?

My main usage goal: enable more-efficient git a) storage and b) diff-abiliity of sqlite databases. (Yes, git-based.)

Additional questions:

  1. are there alternatives to sqlite-diffable other than https://stackoverflow.com/a/21789167/605356 ?
  2. Is there anything I can do to help implement a load command?

Additional reference (for my sake): https://news.ycombinator.com/item?id=25004913

fyi. The following is my environment's data after installing sqlite-diffable today:

$ sqlite-diffable --version
sqlite-diffable, version 0.2.1
$
$ sqlite-diffable --help
Usage: sqlite-diffable [OPTIONS] COMMAND [ARGS]...

  Tools for dumping/loading a SQLite database to diffable directory structure

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  dump
$
$ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.15.7
BuildVersion:	19H1713
$
$ date
Wed Feb 23 21:41:32 CST 2022
$
$ sqlite-diffable load
Usage: sqlite-diffable [OPTIONS] COMMAND [ARGS]...
Try 'sqlite-diffable --help' for help.

Error: No such command 'load'.
$

johnnyutahh avatar Feb 24 '22 04:02 johnnyutahh

Checking in - any update(s) on this topic/issue/discussion? ( @simonw )

johnnyutahh avatar Apr 27 '23 21:04 johnnyutahh