fq icon indicating copy to clipboard operation
fq copied to clipboard

Database files: sqlite, dbf etc?

Open jaakla opened this issue 4 years ago • 5 comments

out of scope?

jaakla avatar Dec 24 '21 10:12 jaakla

Hi, it depends a bit what the usage is. The main usage of fq it to provide detailed access to a binary format so formats that have "user facing" (rows and columns etc) format that is different from how it's actually encoded (indexes and various compact value encodings etc) might decode to something a user would not expect. Have a look at #25 where i describe issue and possible solutions to it.

BTW have you looked at https://github.com/neilotoole/sq? is that more close to want your looking for?

wader avatar Dec 24 '21 14:12 wader

I mean access of raw structures. Yep it is more close to real data for simpler/older databases like dbf (my common problem: what is charset?) and can be more tricky for eg sqlite or leveldb. For advanced bigdata ones like heavily compressed ORC etc it probably really does not make much sense.

jaakla avatar Dec 26 '21 08:12 jaakla

Ok! then a fq decoder could be a good fit i think. Is it something you would be interested to work on?

I wonder how hard it would be to have some per format plumbing in jq that traverses the raw structure and output rows etc. Actually jq:s standard library has some SQL-like support already :)

wader avatar Dec 26 '21 09:12 wader

Hey, got a bit curious how btree traversing in jq could work so had to just try. Very much work in progress and does not support page overflow yet so probably only works with tables with small rows. Feel free to work on it if you want or if you have any data modeling and naming suggestions.

With this you can do this:

$ cat format/sqlite3/testdata/test.sql
CREATE TABLE aaa (
    cint int primary key,
    cvarchar varchar(30),
    ctext text,
    creal real,
    cblob blob
);
INSERT INTO "aaa" VALUES(0, 'var1', 'text1', 0, "blob1");
INSERT INTO "aaa" VALUES(1, 'var2', 'test2', 1, "blob2");
INSERT INTO "aaa" VALUES(128, 'var3', 'test3', 128, "blob3");
INSERT INTO "aaa" VALUES(-128, 'var3', 'test3', -128, "blob3");
INSERT INTO "aaa" VALUES(9223372036854775807, 'var4', 'test4', 9223372036854775807, "blob4");
INSERT INTO "aaa" VALUES(-9223372036854775808, 'var5', 'test5', -9223372036854775808, "blob5");

$ go run fq.go -d sqlite3 '.header, .pages[0] | d' format/sqlite3/testdata/test.db
    │00 01 02 03 04 05 06 07 08 09 0a 0b│0123456789ab│.header{}:
0x00│53 51 4c 69 74 65 20 66 6f 72 6d 61│SQLite forma│  magic: "SQLite format 3\x00" (valid)
0x0c│74 20 33 00                        │t 3.        │
0x0c│            10 00                  │    ..      │  page_size: 4096
0x0c│                  01               │      .     │  write_version: "legacy" (1)
0x0c│                     01            │       .    │  read_version: "legacy" (1)
0x0c│                        00         │        .   │  unused_space: 0
0x0c│                           40      │         @  │  maximum_embedded_payload_fraction: 64
0x0c│                              20   │            │  minimum_embedded_payload_fraction: 32
0x0c│                                 20│            │  leaf_payload_fraction: 32
0x18│00 00 00 07                        │....        │  file_change_counter: 7
0x18│            00 00 00 03            │    ....    │  database_size_pages: 3
0x18│                        00 00 00 00│        ....│  page_number_freelist: 0
0x24│00 00 00 00                        │....        │  total_number_freelist: 0
0x24│            00 00 00 01            │    ....    │  schema_cookie: 1
0x24│                        00 00 00 04│        ....│  schema_format_number: 4
0x30│00 00 00 00                        │....        │  default_page_cache_size: 0
0x30│            00 00 00 00            │    ....    │  page_number_largest_root_btree: 0
0x30│                        00 00 00 01│        ....│  text_encoding: "utf8" (1)
0x3c│00 00 00 00                        │....        │  user_version: 0
0x3c│            00 00 00 00            │    ....    │  incremental_vacuum_mode: 0
0x3c│                        00 00 00 00│        ....│  application_id: 0
0x48│00 00 00 00 00 00 00 00 00 00 00 00│............│  reserved: raw bits (all zero)
0x54│00 00 00 00 00 00 00 00            │........    │
0x54│                        00 00 00 07│        ....│  version_valid_for: 7
0x60│00 2e 53 60                        │..S`        │  sqlite_version_number: 3036000
     │00 01 02 03 04 05 06 07 08 09 0a 0b│0123456789ab│.pages[0]{}:
0x060│            0d                     │    .       │  type: "table_leaf" (13) (Table leaf b-tree page)
0x060│               0f f8               │     ..     │  start_freeblocks: 4088
0x060│                     00 02         │       ..   │  page_cells: 2
0x060│                           0f 44   │         .D │  cell_start: 3908
0x060│                                 00│           .│  cell_fragments: 0
     │                                   │            │  cells_pointers[0:2]:
0x06c│0f 44                              │.D          │    [0]: 3908
0x06c│      0f d1                        │  ..        │    [1]: 4049
     │                                   │            │  cells[0:2]:
     │                                   │            │    [0]{}:
0xf3c│                        81 0a      │        ..  │      payload_len: 138
0xf3c│                              01   │          . │      rowid: 1
     │                                   │            │      payload{}:
0xf3c│                                 07│           .│        length: 7
     │                                   │            │        serials[0:5]:
0xf48│17                                 │.           │          [0]: 23
0xf48│   13                              │ .          │          [1]: 19
0xf48│      13                           │  .         │          [2]: 19
0xf48│         01                        │   .        │          [3]: 1
0xf48│            81 7b                  │    .{      │          [4]: 251
     │                                   │            │        contents[0:5]:
0xf48│                  74 61 62 6c 65   │      table │          [0]: "table" (text)
0xf48│                                 61│           a│          [1]: "aaa" (text)
0xf54│61 61                              │aa          │
0xf54│      61 61 61                     │  aaa       │          [2]: "aaa" (text)
0xf54│               02                  │     .      │          [3]: 2 (8-bit integer)
0xf54│                  43 52 45 41 54 45│      CREATE│          [4]: "CREATE TABLE aaa (\n    cint int primary key,\n    c"... (text)
0xf60│20 54 41 42 4c 45 20 61 61 61 20 28│ TABLE aaa (│
*    │until 0xfd0.7 (119)                │            │
     │                                   │            │    [1]{}:
0xfcc│               25                  │     %      │      payload_len: 37
0xfcc│                  02               │      .     │      rowid: 2
     │                                   │            │      payload{}:
0xfcc│                     06            │       .    │        length: 6
     │                                   │            │        serials[0:5]:
0xfcc│                        17         │        .   │          [0]: 23
0xfcc│                           39      │         9  │          [1]: 57
0xfcc│                              13   │          . │          [2]: 19
0xfcc│                                 01│           .│          [3]: 1
0xfd8│00                                 │.           │          [4]: 0
     │                                   │            │        contents[0:5]:
0xfd8│   69 6e 64 65 78                  │ index      │          [0]: "index" (text)
0xfd8│                  73 71 6c 69 74 65│      sqlite│          [1]: "sqlite_autoindex_aaa_1" (text)
0xfe4│5f 61 75 74 6f 69 6e 64 65 78 5f 61│_autoindex_a│
0xff0│61 61 5f 31                        │aa_1        │
0xff0│            61 61 61               │    aaa     │          [2]: "aaa" (text)
0xff0│                     03            │       .    │          [3]: 3 (8-bit integer)
     │                                   │            │          [4]: "NULL" (null)
     │                                   │            │  freeblocks[0:1]:
     │                                   │            │    [0]{}:
0xff0│                        00 00      │        ..  │      next_offset: 0

$ go run fq.go -d sqlite3 'sqlite3_table("aaa") | ., tovalue' format/sqlite3/testdata/test.db
      │00 01 02 03 04 05 06 07 08 09 0a 0b│0123456789ab│.pages[1].cells[0].payload.contents[0:5]:
0x1f5c│      80 00 00 00 00 00 00 00      │  ........  │  [0]: -9223372036854775808 (64-bit integer)
0x1f5c│                              76 61│          va│  [1]: "var5" (text)
0x1f68│72 35                              │r5          │
0x1f68│      74 65 73 74 35               │  test5     │  [2]: "test5" (text)
0x1f68│                     c3 e0 00 00 00│       .....│  [3]: -9.223372036854776e+18 (64-bit float)
0x1f74│00 00 00                           │...         │
0x1f74│         62 6c 6f 62 35            │   blob5    │  [4]: "blob5" (text)
[
  -9223372036854775808,
  "var5",
  "test5",
  -9223372036854776000,
  "blob5"
]
      │00 01 02 03 04 05 06 07 08 09 0a 0b│0123456789ab│.pages[1].cells[1].payload.contents[0:5]:
0x1f80│            7f ff ff ff ff ff ff ff│    ........│  [0]: 9223372036854775807 (64-bit integer)
0x1f8c│76 61 72 34                        │var4        │  [1]: "var4" (text)
0x1f8c│            74 65 73 74 34         │    test4   │  [2]: "test4" (text)
0x1f8c│                           43 e0 00│         C..│  [3]: 9.223372036854776e+18 (64-bit float)
0x1f98│00 00 00 00 00                     │.....       │
0x1f98│               62 6c 6f 62 34      │     blob4  │  [4]: "blob4" (text)
[
  9223372036854775807,
  "var4",
  "test4",
  9223372036854776000,
  "blob4"
]
      │00 01 02 03 04 05 06 07 08 09 0a 0b│0123456789ab│.pages[1].cells[2].payload.contents[0:5]:
0x1fa4│                  80               │      .     │  [0]: -128 (8-bit integer)
0x1fa4│                     76 61 72 33   │       var3 │  [1]: "var3" (text)
0x1fa4│                                 74│           t│  [2]: "test3" (text)
0x1fb0│65 73 74 33                        │est3        │
0x1fb0│            80                     │    .       │  [3]: -128 (8-bit integer)
0x1fb0│               62 6c 6f 62 33      │     blob3  │  [4]: "blob3" (text)
[
  -128,
  "var3",
  "test3",
  -128,
  "blob3"
]
      │00 01 02 03 04 05 06 07 08 09 0a 0b│0123456789ab│.pages[1].cells[3].payload.contents[0:5]:
0x1fbc│                  00 80            │      ..    │  [0]: 128 (16-bit integer)
0x1fbc│                        76 61 72 33│        var3│  [1]: "var3" (text)
0x1fc8│74 65 73 74 33                     │test3       │  [2]: "test3" (text)
0x1fc8│               00 80               │     ..     │  [3]: 128 (16-bit integer)
0x1fc8│                     62 6c 6f 62 33│       blob3│  [4]: "blob3" (text)
[
  128,
  "var3",
  "test3",
  128,
  "blob3"
]
      │00 01 02 03 04 05 06 07 08 09 0a 0b│0123456789ab│.pages[1].cells[4].payload.contents[0:5]:
      │                                   │            │  [0]: 1 (constant 1)
0x1fd4│                        76 61 72 32│        var2│  [1]: "var2" (text)
0x1fe0│74 65 73 74 32                     │test2       │  [2]: "test2" (text)
      │                                   │            │  [3]: 1 (constant 1)
0x1fe0│               62 6c 6f 62 32      │     blob2  │  [4]: "blob2" (text)
[
  1,
  "var2",
  "test2",
  1,
  "blob2"
]
      │00 01 02 03 04 05 06 07 08 09 0a 0b│0123456789ab│.pages[1].cells[5].payload.contents[0:5]:
      │                                   │            │  [0]: 0 (constant 0)
0x1fec│                  76 61 72 31      │      var1  │  [1]: "var1" (text)
0x1fec│                              74 65│          te│  [2]: "text1" (text)
0x1ff8│78 74 31                           │xt1         │
      │                                   │            │  [3]: 0 (constant 0)
0x1ff8│         62 6c 6f 62 31            │   blob1    │  [4]: "blob1" (text)
[
  0,
  "var1",
  "text1",
  0,
  "blob1"
]

wader avatar Dec 28 '21 12:12 wader

Made some progress but still needs work. This works now, i call it "fqlite":

$ go run fq.go --arg name "R.E.M." 'torepr as $db | first($db.artists[] | select(.[1]==$name)) as $artist | $db.albums[] | select(.[2] == $artist[0]) | .[1] | tovalue' format/sqlite3/testdata/chinook.db
"Green"
"New Adventures In Hi-Fi"
"The Best Of R.E.M.: The IRS Years"

wader avatar Jan 22 '22 10:01 wader