Postgres
PostgreSQL files decoder. Implemented formats: pg_control, pg_heap.
Hey, thanks! I've started reviewing a bit so that we can get going. I don't have any prior experience with the internals of postgres so bare with me to get my head around this and if i misunderstand things.
Hey, hope all is good. I noticed that you did not show up as a contributor after #411. It could be because the email git uses is not known by github?
And let me know if your stuck on something
Hey, hope all is good. I noticed that you did not show up as a contributor after #411. It could be because the email git uses is not known by github?
And let me know if your stuck on something
Thank you. All is fine.
I am working now with WAL file in postgres. I have some issues with format. When i make something working i will go back to refactoring and fixing code mentioned in comments.
Great to hear 👍
Are some of these formats "sub"-formats of other formats? ex: is the pg_wal_page used inside a wal file? then maybe make sense to make pg_wal have it as dependency and use it. I've noticed that it can sometimes make the code easier to follow as the sub format will be decoded in separation and from bit 0. See ex the mp3 decoder how it uses mp3_frame or macho_fat and macho. But it might depend a bit, if there is lots of complicated sharing needed between the formats it might not be worth it.
Nice to see you active again btw! i will probably leave comments a bit at random when i have time to review
Hey, i made some changes to help and help tests https://github.com/wader/fq/pull/430 for you i think it's just to run make doc again and possible add a help test, see some help_*.fqtest file
ges to help and help tests #430 for you i think it's just to run
make docagain and possible add a help test, see some help_*.fqtest file
Thank you!
Nice to see some progress. Does it make sense to review more or should i wait a bit?
No wait, please do review.
I noticed that github seems to not recognize your email, maybe want to configure git?
Would it be possible to include how the binary files in testdata were generated? that would make it easier to update and add things in the future. For the sqlite PR https://github.com/wader/fq/pull/36/files i include SQL-files and a Makefile to generate.
But i would guess it's a bit more complicated with postgres? possible to use docker etc and copy out data files?
Was fiddling around a bit with the infomask thing. Maybe it could be decoded something like this to make most of the bool have a bit range?
func decodeInfomask(heap *Heap, d *decode.D) {
d.FieldBool("heap_xmax_lock_only")
heapXmaxExclLock := d.FieldBool("heap_xmax_excl_lock")
d.FieldBool("heap_combocid")
heapXmaxKeyshrLock := d.FieldBool("heap_xmax_keyshr_lock")
d.FieldBool("heap_hasoid_old")
d.FieldBool("heap_hasexternal")
d.FieldBool("heap_hasvarwidth")
d.FieldBool("heap_hasnull")
heapMovedIn := d.FieldBool("heap_moved_in")
heapMovedOff := d.FieldBool("heap_moved_off")
d.FieldBool("heap_updated")
heap.Tuple.IsMulti = d.FieldBool("heap_xmax_is_multi")
d.FieldBool("heap_xmax_invalid")
d.FieldBool("heap_xmax_committed")
heapXminInvalid := d.FieldBool("heap_xmin_invalid")
heapXminCommitted := d.FieldBool("heap_xmin_committed")
d.FieldValueBool("heap_xmin_frozen", heapXminInvalid || heapXminCommitted)
d.FieldValueBool("heap_xmax_shr_lock", heapXmaxExclLock || heapXmaxKeyshrLock)
// TODO: same as above?
d.FieldValueBool("heap_lock_mask", heapXmaxExclLock || heapXmaxKeyshrLock)
d.FieldValueBool("heap_moved", heapMovedOff || heapMovedIn)
}
then it will look like this:
│00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d│0123456789abcd│.[0].tuples[0].header.infomask_bits{}:
0x1fe2│ 01 │ . │ heap_xmax_lock_only: false
0x1fe2│ 01 │ . │ heap_xmax_excl_lock: false
0x1fe2│ 01 │ . │ heap_combocid: false
0x1fe2│ 01 │ . │ heap_xmax_keyshr_lock: false
0x1fe2│ 01 │ . │ heap_hasoid_old: false
0x1fe2│ 01 │ . │ heap_hasexternal: false
0x1fe2│ 01 │ . │ heap_hasvarwidth: false
0x1fe2│ 01 │ . │ heap_hasnull: true
0x1fe2│ 09 │ . │ heap_moved_in: false
0x1fe2│ 09 │ . │ heap_moved_off: false
0x1fe2│ 09 │ . │ heap_updated: false
0x1fe2│ 09 │ . │ heap_xmax_is_multi: false
0x1fe2│ 09 │ . │ heap_xmax_invalid: true
0x1fe2│ 09 │ . │ heap_xmax_committed: false
0x1fe2│ 09 │ . │ heap_xmin_invalid: false
0x1fe2│ 09 │ . │ heap_xmin_committed: true
│ │ │ heap_xmin_frozen: true
│ │ │ heap_xmax_shr_lock: false
│ │ │ heap_lock_mask: false
│ │ │ heap_moved: false
I noticed that all flavours uses postgres14/common14 at the moment, should it be moved to just common? or is there some reason it should be in 14?
There is not reason to use PostgreSQL 14 as main version. I have made refactoring and separated binary compatible versions.
Also i added check sum calculation on page.
Would it be possible to include how the binary files in testdata were generated? that would make it easier to update and add things in the future. For the sqlite PR https://github.com/wader/fq/pull/36/files i include SQL-files and a Makefile to generate.
But i would guess it's a bit more complicated with postgres? possible to use docker etc and copy out data files? I think it's possible to use docker. But i used another tools: packer, terraform, ansible, custom utils. All database files were generated by pgbench.
I think it's possible to use docker. But i used another tools: packer, terraform, ansible, custom utils. All database files were generated by pgbench.
Okok if not too much trouble it would be nice with some kind of documentation how they were produced or how to reproduce them. I know for some other formats it has been very valuable to someone else or your future self :)
i added how_to.md with instruction: how to generate test files for PostgreSQL.
Hey, sorry i haven't review much. Is there something you'r waiting for me to help with? The decodeInfomask thing would be nice if is was possible to decode so that the bit fields ends up with bit ranges, would something similar to the example i posted work? with that is would be possible in the future to maybe modify specific bits etc
Hi @wader .
I'm going to test and review this patch in details and come back. I'm in touch with @pnsafonov
@mizhka Great, review help is very welcomed 👍 and I really appreciate all the work @pnsafonov has put into this. It's a quite big decoder and the format itself has some interesting decoding modelling challenges. I'm not a postgres internals expert by any means (just a user) so can mostly help out with experience from other decoders and how to make the decoder fit well into fq.
Just ping me if there is any questions and i will probably show up with comments also
I removed WAL file parsing. I failed to implement it.
I removed WAL file parsing. I failed to implement it.
Ok, what part made it hard? i know for sqlite i got stuck at having to traverse a btree in the decode to know which pages are used, is possible to do but quite complicated.
, what part made it hard? i know for sqlite i got stuck at having to traverse a btree in the decode to know which pages are used, is possible to do bu
Postgres WAL is devided into 8kb pages. Each page contains header, next are records. Record can be divided into multiple pages. Postgres reconstructs each record in separate buffer when read WAL. WAL record is not continuous. It can split uint32: 3 bytes on current page, 1 byte on the next page (page header between 3 and 1 bytes).
It is complicated to read split uint32 (ulong64, string and etc).
, what part made it hard? i know for sqlite i got stuck at having to traverse a btree in the decode to know which pages are used, is possible to do bu
Postgres WAL is devided into 8kb pages. Each page contains header, next are records. Record can be divided into multiple pages. Postgres reconstructs each record in separate buffer when read WAL. WAL record is not continuous. It can split uint32: 3 bytes on current page, 1 byte on the next page (page header between 3 and 1 bytes).
It is complicated to read split uint32 (ulong64, string and etc).
Aha i see, if i understand correctly one has to reassemble WAL records from pages? the fq decode API do support to do things like this. You can collect bytes into a bytes buffer and then decode it as a "sub" buffer. For example see the ogg decoder which read pages that contains segments of packets, the segments are demuxed into continuous packet buffers and then decoded. Other examples is is zip that decodes uncompressed data or the pcap decoder that reassemble tcp streams and decodes.
Would something like that work?
Is anyone still working/using this? would be great to get it merged in some form. Maybe we can try split into a PR with basic things are work and one with things not so clear yet?
I am working on PostgreSQL decoder. I will try to make pull request ready for merge.
@pnsafonov That would be great 👍
BTW have you been able to use this decode in practice? like debug corruption or for education etc?
BTW have you been able to use this decode in practice? like debug corruption or for education etc?
Yes, i have some experience of PostgreSQL files corruption analysis with fq. In addition it's good for visualizing data.
@pnsafonov Ok! nice to hear