Adds support for Apple Binary Plist, version 00
This adds support for decoding Apple Binary Plists. The only well documented version is 00, and is therefore the only one supported here. I have tested this on both large and small binary plists, including ones with nested dictionaries.
Some test file with be nice Should be just to put a bplist in testdata/test.bplist and a testdata/test.fqtest with $ fq dv test.bplist and run tests with WRITE_ACTUAL and inspect if output looks sane
A torepr function could look something like this for the current tree structure:
def _bplist_torepr:
def _f:
( .object
| if .type == "singleton" then .value
elif .type == "int" then .value
elif .type == "real" then .value
elif .type == "date" then .value
elif .type == "data" then .value.data
elif .type == "ascii_string" then .value.value
elif .type == "unicode_string" then .value.value
elif .type == "uid" then .value
elif .type == "array" then
( .elements
| map(_f)
)
elif .type == "set" then
( .elements
| map(_f)
)
elif .type == "dict" then
( .dictionary.entries
| map({key: (.key | _f), value: (.value | _f)})
| from_entries
)
else error("unknown type: \(.type)")
end
);
( .objects
| _f
);
With this diff:
--- a/format/bplist/bplist.go
+++ b/format/bplist/bplist.go
@@ -1,6 +1,7 @@
package bplist
import (
+ "embed"
"math"
"time"
@@ -10,6 +11,9 @@ import (
"github.com/wader/fq/pkg/scalar"
)
+//go:embed bplist.jq
+var bplistFS embed.FS
+
func init() {
interp.RegisterFormat(decode.Format{
Name: format.BPLIST,
@@ -17,7 +21,9 @@ func init() {
Description: "Apple Binary Property List",
Groups: []string{format.PROBE},
DecodeFn: bplistDecode,
+ Functions: []string{"torepr"},
})
+ interp.RegisterFS(bplistFS)
}
const (
You can do:
➜ fq git:(bplist) ✗ go run . torepr /System/Library/AssetsV2/com_apple_MobileAsset_LinguisticData/c55ebecca7037d50a22fe39315a802984688c69f.asset/AssetData/parser/config.plist
{
"CFBundleDevelopmentRegion": "en",
"CFBundleExecutable": "$(EXECUTABLE_NAME)",
"CFBundleIdentifier": "com.apple.NLP",
"CFBundleInfoDictionaryVersion": "6.0",
"CFBundleName": "$(PRODUCT_NAME)",
"CFBundlePackageType": "BNDL",
"CFBundleShortVersionString": "1.0",
"CFBundleSignature": "????",
"CFBundleVersion": "1",
"CanonicalRegions": {
"de": {
...
And things like $ fq -r 'torepr.CanonicalRegions | toyaml' file.bplist etc
Is there a way to have the torepr function display the scalar mapped value instead of the value itself? Would like to have dates represented as a timestamp instead of a floating point value, since it's not obvious how this float has to be converted.
-h formats test seems to fail, is in ./interp, should probably move it to format hmm
Maybe add a torepr test?
Will have a last look when im at a computer, but looks very good now
Is there a way to have the
toreprfunction display the scalar mapped value instead of the value itself? Would like to have dates represented as a timestamp instead of a floating point value, since it's not obvious how this float has to be converted.
torepr will use the scalars default "value" which is the symbolic value if set otherwise the actual value. But in your timestamp case a scalar mapper can also set a string description, that string is used when showing the hexdump tree thingy and can also be accessed with todescription (there is also toactual, tosym, and tovalue).
Yeah that is a bit unfortunate with the weird timestamp epoch, could it make sense to make the sym value a float unix timestamp or is even more confusing? In other decoders i've kept it as numbers as it seemed nicer for queries (doing comparisons etc). Maybe it would make sense to add more time functions? jq has strptime but i haven't used it much myself.
There is some half-finished work in fq to make tovalue have an option to prefer the actual value, maybe something like that could be done to prefer description? and ideas?
Another alternative is to add an option to bplist decoder for how timestamps should be handled? maybe -o timestamp=unix, -o timetamp=iso8601, -o timestamp=cocoa etc?
I have a feeling it will probably also fail on some help output tests that is in formats/all.
Maybe all help output tests should be moved into individual formats? i will probably do that later on. Should probably so that a format will not affects tests outside its own testdata dirctory
An interesting future feature would be to write a toplist in jq, there is toxml :)
Hey, i made some changes to help and help tests https://github.com/wader/fq/pull/430 for you i think it's just to run make doc again and possible add a help test, see some help_*.fqtest file
Thanks! I've been super busy but hopefully I can put the finishing touches on this this weekend.
No worries, great, yes i think it's just a few small things left. But i will take an extra look if there is anything so that you have all comments and suggestions ready for the weekend
Is FieldRawLen the correct way to decode a blob of binary data, such as in the Data type for binary plists? I'm having trouble extracting the data using toactual in the case that I might want to write it out into another file (although I may just be missing something). I get an error like:
can't convert actual value jq value &bitio.SectionReader{r:(*bitio.SectionReader)(0xc0009a53b0), bitBase:768, bitOff:768, bitLimit:128128}
Also, it seems weird that tovalue produces a base64 encoding, but it is truncated for longer values:
./fq 'torepr.SandboxProfileData | tovalue' /Users/davidmcdonald/Library/Containers/com.apple.LoginUserService/Container.plist
"<15920>AACgAJIAAAAFAAAALgASAV8DLAMWBJ8AnQCcAJ8AnwCfAJ8AngCfAJ4AnwCbAJ4AlACDAH4AfQCbAHwAfABhAF4AngBcAJsAmwCeAHwAUgBSAEwASQBSAFIAUgBSAJ8AUgBDAEAAnwCfAJ8AnwCeAJ8AnwCCAJ8AnwCfADsANQCeAJ8AnwCfAJ8AnwCfAJ8AnwCfADQAMAA0AC8ALAArAJ8AnwCfAJ8AnwCfAJ4AnwCeAJ8AnwCfAJ8AHwCfAJ8AnwAcAJ4AGwAbABoAFQCfAJ8AFACfAJ8AnwATABMAEwAOAA4AngCfAJ4AEwCfABMAngATABMAEwCeAAwAngALAA=="
In this case, only 0x100 of the 15920 bytes are encoded as base64, the rest is missing
Is
FieldRawLenthe correct way to decode a blob of binary data, such as in theDatatype for binary plists? I'm having trouble extracting the data usingtoactualin the case that I might want to write it out into another file (although I may just be missing something). I get an error like:can't convert actual value jq value &bitio.SectionReader{r:(*bitio.SectionReader)(0xc0009a53b0), bitBase:768, bitOff:768, bitLimit:128128}Also, it seems weird that
tovalueproduces a base64 encoding, but it is truncated for longer values:./fq 'torepr.SandboxProfileData | tovalue' /Users/davidmcdonald/Library/Containers/com.apple.LoginUserService/Container.plist "<15920>AACgAJIAAAAFAAAALgASAV8DLAMWBJ8AnQCcAJ8AnwCfAJ8AngCfAJ4AnwCbAJ4AlACDAH4AfQCbAHwAfABhAF4AngBcAJsAmwCeAHwAUgBSAEwASQBSAFIAUgBSAJ8AUgBDAEAAnwCfAJ8AnwCeAJ8AnwCCAJ8AnwCfADsANQCeAJ8AnwCfAJ8AnwCfAJ8AnwCfADQAMAA0AC8ALAArAJ8AnwCfAJ8AnwCfAJ4AnwCeAJ8AnwCfAJ8AHwCfAJ8AnwAcAJ4AGwAbABoAFQCfAJ8AFACfAJ8AnwATABMAEwAOAA4AngCfAJ4AEwCfABMAngATABMAEwCeAAwAngALAA=="In this case, only 0x100 of the 15920 bytes are encoded as base64, the rest is missing
Yes raw is for raw bits (also does not have to be even bytes). Short version is to use tobits/tobytes (depending on which unit size for slicing you want) to get raw bits, otherwise it will behave more as "preview" string. See longer version below.
So at the moment you can do something like this:
# will show hexdump if stdout is a tty (to be safe), can do ... | cat if you really want raw bytes in the tty
$ go run . 'torepr.SandboxProfileData | tobytes' /Users/wader//Library/Containers/com.apple.LoginUserService/Container.plist
# will write raw data as stdout
$ go run . 'torepr.SandboxProfileData | tobytes' /Users/wader//Library/Containers/com.apple.LoginUserService/Container.plist > data
Longer version:
Problem comes from how to represent binary data as jq values. I've tried a couple of different variants, introduce a new binary type, array of ints and as strings, they all have different drawbacks and issues.
- A new
binarytype feels natural to add but turns out the jq standard library code etc has assumptions about which types can exist. In fq you can still get the "real" type using_exttype,[0xff,0x90] | tobytes | _exttypeis "binary". Also it still have to be able to be a JSON compatible value at some point. - Array of ints would still have to be special in some way to know how big each element is (bit/byte).
- Strings in jq are unicode codepoints arrays so binary data might be interpreted as multi-byte codepoints and i think there were other confusing issue making not suitable for binary, see ex:
"åäö" | .,tobytes | length.
Also there is the issue what to do with some formats that has raw fields that can be very large (ex: mp4 mdat), include all or truncates somehow?
So the current compromise is that raw will be base64 string (to be jq compatible) and also be truncated by default. It is possible change the tovalue truncate behaviour with -o bits_format=base64 (there is also md5).
Sorry for the long rant :) but it's very good someone else is messing around with this as i'm not that happy with the current design and i think it can be made better and less confusing, so feedback is very welcome.
BTW fq has support for "binary arrays" (similar to iolists in erlang) so you slice and concatenate parts into a new binary. Maybe a not very good example:
# build a binary array with a bytes (0), a binary slice and a string (will be utf8 bytes) and try to decode it as s bplist (force to skip magic check).
$ go run . -n '"hello" | tobits | [0, .[8:16], "a string"] | tobytes | bplist({force: true}) | d'
│00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f│0123456789abcdef│.{}: (bplist)
│ │ │ error: bplist: SeekAbs: failed at position 8 (read size 0 seek pos 0): invalid seek offset
│ │ │ header{}:
0x0│00 65 61 20 73 74 │.ea st │ magic: "\x00ea st" (invalid)
0x0│ 72 69 │ ri │ version: "ri" (invalid)
0x0│ 6e 67│ │ ng│ │ unknown0: raw bits
Handling timestamps is tricky. On the one hand, it's tempting to render it as the description by default, since that is the behavior of plutil and presents the value clearly to the user. On the other hand, jq does have functions for converting timestamps between formats from the raw value. I think the best solution for now is to be true to the original format and render the value as the decoded floating point value, and access the description using todescription as you suggested, that way we stay consistent with the jq way of doing things. I've added a note on this in the format documentation.
singletons seems to produce unknown fields:
➜ fq git:(bplist) ✗ go run . -o line_bytes=10 'grep_by(.type=="singleton"), .unknown0, .unknown1 | dv' /Users/wader//Library/Containers/com.apple.LoginUserService/Container.plist
│00 01 02 03 04 05 06 07 08 09│0123456789│.objects.entries[3].value.entries[5].value.entries[0].value{}: 0x475b-0x475b.7 (1)
0x4754│ 09 │ . │ type: "singleton" (0) (Singleton value (null/bool)) 0x475b-0x475b.3 (0.4)
│ │ │ value: true 0x475c-NA (0)
│00 01 02 03 04 05 06 07 08 09│0123456789│.objects.entries[3].value.entries[5].value.entries[1].value{}: 0x475b-0x475b.7 (1)
0x4754│ 09 │ . │ type: "singleton" (0) (Singleton value (null/bool)) 0x475b-0x475b.3 (0.4)
│ │ │ value: true 0x475c-NA (0)
│00 01 02 03 04 05 06 07 08 09│0123456789│
0x4754│ 09 │ . │.unknown0: raw bits 0x475b.4-0x475b.7 (0.4)
│00 01 02 03 04 05 06 07 08 09│0123456789│
0x4754│ 09 │ . │.unknown1: raw bits 0x475c-0x475c.7 (1)
Looks like 12 bits after the type is seen as unknown (should be one unknown field but seems there is a bug in the gap code, maybe because of synthetic fields, will have a look).
Also was a bit confused first until i realized that bplist can apparently use the index multiple times, quite cool.
Range gap issue fixed in master https://github.com/wader/fq/pull/431, now unknown field shows up as one 12 bit field.
But i got a bit unsure if the gap we're seeing in the example above is correct or not? does "normal" bplist have them? i guess based on how bplist uses offsets tables it would be possible to have ranges that are unused/unknown but does that happen in practice? maybe because alignment etc?
Generally i've tried to make decoders behave so that they ends up with gaps only for things they don't know about/should not be there, ex unknown trailing data.
With https://github.com/wader/fq/pull/432 toactual and tosym behave the same as tovalue. Also fixed the error. I think that make sense?
The code that handles this is starting to get a bit out of hand, badly needs a rethink/refactor
Im thinking about releasing 0.0.10 soonish, would be nice to include this. If your busy we can merge and i can try fix the remaning things if you like?
Im thinking about releasing 0.0.10 soonish, would be nice to include this. If your busy we can merge and i can try fix the remaning things if you like?
Sorry, I've been drowning in work and thesis so I haven't had time to figure out those last unknown bytes. If you want to merge it for the next release, I'm happy to figure out the bugs when I get a chance, or you can if you have the time.
Sorry, I've been drowning in work and thesis so I haven't had time to figure out those last unknown bytes. If you want to merge it for the next release, I'm happy to figure out the bugs when I get a chance, or you can if you have the time.
No need to be sorry, focus on thesis! what is it about?
Ok i'll let you know here if i figure something out
Hey, this should fix the unknown field for singletons:
diff --git a/format/bplist/bplist.go b/format/bplist/bplist.go
index 7c74848c..83480d0d 100644
--- a/format/bplist/bplist.go
+++ b/format/bplist/bplist.go
@@ -91,15 +91,11 @@ func decodeItem(d *decode.D, p *plist) {
m := d.FieldU4("type", elementTypeMap)
switch m {
case elementTypeNullOrBoolOrFill:
- t := d.U4()
- switch t {
- case null:
- d.FieldValueNil("value")
- case boolTrue:
- d.FieldValueBool("value", true)
- case boolFalse:
- d.FieldValueBool("value", false)
- }
+ d.FieldU4("value", scalar.UToScalar{
+ null: scalar.S{Sym: nil},
+ boolTrue: scalar.S{Sym: true},
+ boolFalse: scalar.S{Sym: false},
+ })
case elementTypeInt:
n := d.FieldUFn("size", func(d *decode.D) uint64 {
return 1 << d.U4()
Maybe left is to add some torepr test, rebase on master and regenerate documentation and test actual output.
After that i think were done or something more you want to do?
Let's merge and i fix the things in master
Thanks a lot for your contribution 🥳 Hope the decode API was ok to work with and that you might want to add more formats etc in the future!
Hmm noticed now that the bplist commits don't have your github email. Feel free to do some dummy PR if you want to show up as a contributor to the project.
@dgmcdona Hey, got your message but can't see it here? strange.
Anyways, no problem and totally understand! was so little left and wanted it part of 0.0.10 :) yes i'm looking forward to future contributions and feel free to email me or open issues if you have ideas or want to discuss something, ex how fq could be used in forensics.
What kind of formats are common CF-formats? filesystems etc? I haven't used fq for that but it is designed to handle broken files and i try divide formats into smaller "subformats" to make it possible to decode parts separately etc. For example i've used fq quite a lot to search for patterns then try to decode and filter something at each match, as jq is a generator based language it is quite ergonomic to do, ex try to decode each occurrence for 0xfff8 as a FLAC frame:
tobytes as $b | scan([0xff,0xf8]) | $b[.start:] | flac_frame
Would be really great of someone else experimented with things like that as the functions and behaviors of the binary type in fq is a bit strange at times and i don't really know myself how i would like it to work :)