fq icon indicating copy to clipboard operation
fq copied to clipboard

Adds support for Apple Binary Plist, version 00

Open dgmcdona opened this issue 3 years ago • 8 comments

This adds support for decoding Apple Binary Plists. The only well documented version is 00, and is therefore the only one supported here. I have tested this on both large and small binary plists, including ones with nested dictionaries.

dgmcdona avatar Sep 15 '22 03:09 dgmcdona

Some test file with be nice Should be just to put a bplist in testdata/test.bplist and a testdata/test.fqtest with $ fq dv test.bplist and run tests with WRITE_ACTUAL and inspect if output looks sane

wader avatar Sep 15 '22 09:09 wader

A torepr function could look something like this for the current tree structure:

def _bplist_torepr:
  def _f:
    ( .object
    | if .type == "singleton" then .value
      elif .type == "int" then .value
      elif .type == "real" then .value
      elif .type == "date" then .value
      elif .type == "data" then .value.data
      elif .type == "ascii_string" then .value.value
      elif .type == "unicode_string" then .value.value
      elif .type == "uid" then .value
      elif .type == "array" then
        ( .elements
        | map(_f)
        )
      elif .type == "set" then
        ( .elements
        | map(_f)
        )
      elif .type == "dict" then
        ( .dictionary.entries
        | map({key: (.key | _f), value: (.value | _f)})
        | from_entries
        )
      else  error("unknown type: \(.type)")
      end
    );
  ( .objects
  | _f
  );

With this diff:

--- a/format/bplist/bplist.go
+++ b/format/bplist/bplist.go
@@ -1,6 +1,7 @@
 package bplist

 import (
+       "embed"
        "math"
        "time"

@@ -10,6 +11,9 @@ import (
        "github.com/wader/fq/pkg/scalar"
 )

+//go:embed bplist.jq
+var bplistFS embed.FS
+
 func init() {
        interp.RegisterFormat(decode.Format{
                Name:        format.BPLIST,
@@ -17,7 +21,9 @@ func init() {
                Description: "Apple Binary Property List",
                Groups:      []string{format.PROBE},
                DecodeFn:    bplistDecode,
+               Functions:   []string{"torepr"},
        })
+       interp.RegisterFS(bplistFS)
 }

 const (

You can do:

➜  fq git:(bplist) ✗ go run . torepr /System/Library/AssetsV2/com_apple_MobileAsset_LinguisticData/c55ebecca7037d50a22fe39315a802984688c69f.asset/AssetData/parser/config.plist
{
  "CFBundleDevelopmentRegion": "en",
  "CFBundleExecutable": "$(EXECUTABLE_NAME)",
  "CFBundleIdentifier": "com.apple.NLP",
  "CFBundleInfoDictionaryVersion": "6.0",
  "CFBundleName": "$(PRODUCT_NAME)",
  "CFBundlePackageType": "BNDL",
  "CFBundleShortVersionString": "1.0",
  "CFBundleSignature": "????",
  "CFBundleVersion": "1",
  "CanonicalRegions": {
    "de": {
...

And things like $ fq -r 'torepr.CanonicalRegions | toyaml' file.bplist etc

wader avatar Sep 15 '22 13:09 wader

Is there a way to have the torepr function display the scalar mapped value instead of the value itself? Would like to have dates represented as a timestamp instead of a floating point value, since it's not obvious how this float has to be converted.

dgmcdona avatar Sep 17 '22 20:09 dgmcdona

-h formats test seems to fail, is in ./interp, should probably move it to format hmm

Maybe add a torepr test?

Will have a last look when im at a computer, but looks very good now

wader avatar Sep 18 '22 07:09 wader

Is there a way to have the torepr function display the scalar mapped value instead of the value itself? Would like to have dates represented as a timestamp instead of a floating point value, since it's not obvious how this float has to be converted.

torepr will use the scalars default "value" which is the symbolic value if set otherwise the actual value. But in your timestamp case a scalar mapper can also set a string description, that string is used when showing the hexdump tree thingy and can also be accessed with todescription (there is also toactual, tosym, and tovalue).

Yeah that is a bit unfortunate with the weird timestamp epoch, could it make sense to make the sym value a float unix timestamp or is even more confusing? In other decoders i've kept it as numbers as it seemed nicer for queries (doing comparisons etc). Maybe it would make sense to add more time functions? jq has strptime but i haven't used it much myself.

There is some half-finished work in fq to make tovalue have an option to prefer the actual value, maybe something like that could be done to prefer description? and ideas?

wader avatar Sep 18 '22 08:09 wader

Another alternative is to add an option to bplist decoder for how timestamps should be handled? maybe -o timestamp=unix, -o timetamp=iso8601, -o timestamp=cocoa etc?

wader avatar Sep 18 '22 08:09 wader

I have a feeling it will probably also fail on some help output tests that is in formats/all.

Maybe all help output tests should be moved into individual formats? i will probably do that later on. Should probably so that a format will not affects tests outside its own testdata dirctory

wader avatar Sep 18 '22 08:09 wader

An interesting future feature would be to write a toplist in jq, there is toxml :)

wader avatar Sep 18 '22 08:09 wader

Hey, i made some changes to help and help tests https://github.com/wader/fq/pull/430 for you i think it's just to run make doc again and possible add a help test, see some help_*.fqtest file

wader avatar Sep 22 '22 21:09 wader

Thanks! I've been super busy but hopefully I can put the finishing touches on this this weekend.

dgmcdona avatar Sep 23 '22 01:09 dgmcdona

No worries, great, yes i think it's just a few small things left. But i will take an extra look if there is anything so that you have all comments and suggestions ready for the weekend

wader avatar Sep 23 '22 07:09 wader

Is FieldRawLen the correct way to decode a blob of binary data, such as in the Data type for binary plists? I'm having trouble extracting the data using toactual in the case that I might want to write it out into another file (although I may just be missing something). I get an error like:

can't convert actual value jq value &bitio.SectionReader{r:(*bitio.SectionReader)(0xc0009a53b0), bitBase:768, bitOff:768, bitLimit:128128}

Also, it seems weird that tovalue produces a base64 encoding, but it is truncated for longer values:

./fq 'torepr.SandboxProfileData | tovalue' /Users/davidmcdonald/Library/Containers/com.apple.LoginUserService/Container.plist

"<15920>AACgAJIAAAAFAAAALgASAV8DLAMWBJ8AnQCcAJ8AnwCfAJ8AngCfAJ4AnwCbAJ4AlACDAH4AfQCbAHwAfABhAF4AngBcAJsAmwCeAHwAUgBSAEwASQBSAFIAUgBSAJ8AUgBDAEAAnwCfAJ8AnwCeAJ8AnwCCAJ8AnwCfADsANQCeAJ8AnwCfAJ8AnwCfAJ8AnwCfADQAMAA0AC8ALAArAJ8AnwCfAJ8AnwCfAJ4AnwCeAJ8AnwCfAJ8AHwCfAJ8AnwAcAJ4AGwAbABoAFQCfAJ8AFACfAJ8AnwATABMAEwAOAA4AngCfAJ4AEwCfABMAngATABMAEwCeAAwAngALAA=="

In this case, only 0x100 of the 15920 bytes are encoded as base64, the rest is missing

dgmcdona avatar Sep 24 '22 03:09 dgmcdona

Is FieldRawLen the correct way to decode a blob of binary data, such as in the Data type for binary plists? I'm having trouble extracting the data using toactual in the case that I might want to write it out into another file (although I may just be missing something). I get an error like:

can't convert actual value jq value &bitio.SectionReader{r:(*bitio.SectionReader)(0xc0009a53b0), bitBase:768, bitOff:768, bitLimit:128128}

Also, it seems weird that tovalue produces a base64 encoding, but it is truncated for longer values:

./fq 'torepr.SandboxProfileData | tovalue' /Users/davidmcdonald/Library/Containers/com.apple.LoginUserService/Container.plist

"<15920>AACgAJIAAAAFAAAALgASAV8DLAMWBJ8AnQCcAJ8AnwCfAJ8AngCfAJ4AnwCbAJ4AlACDAH4AfQCbAHwAfABhAF4AngBcAJsAmwCeAHwAUgBSAEwASQBSAFIAUgBSAJ8AUgBDAEAAnwCfAJ8AnwCeAJ8AnwCCAJ8AnwCfADsANQCeAJ8AnwCfAJ8AnwCfAJ8AnwCfADQAMAA0AC8ALAArAJ8AnwCfAJ8AnwCfAJ4AnwCeAJ8AnwCfAJ8AHwCfAJ8AnwAcAJ4AGwAbABoAFQCfAJ8AFACfAJ8AnwATABMAEwAOAA4AngCfAJ4AEwCfABMAngATABMAEwCeAAwAngALAA=="

In this case, only 0x100 of the 15920 bytes are encoded as base64, the rest is missing

Yes raw is for raw bits (also does not have to be even bytes). Short version is to use tobits/tobytes (depending on which unit size for slicing you want) to get raw bits, otherwise it will behave more as "preview" string. See longer version below.

So at the moment you can do something like this:

# will show hexdump if stdout is a tty (to be safe), can do ... | cat if you really want raw bytes in the tty
$ go run . 'torepr.SandboxProfileData | tobytes' /Users/wader//Library/Containers/com.apple.LoginUserService/Container.plist

# will write raw data as stdout
$ go run . 'torepr.SandboxProfileData | tobytes' /Users/wader//Library/Containers/com.apple.LoginUserService/Container.plist > data

Longer version:

Problem comes from how to represent binary data as jq values. I've tried a couple of different variants, introduce a new binary type, array of ints and as strings, they all have different drawbacks and issues.

  • A new binary type feels natural to add but turns out the jq standard library code etc has assumptions about which types can exist. In fq you can still get the "real" type using _exttype, [0xff,0x90] | tobytes | _exttype is "binary". Also it still have to be able to be a JSON compatible value at some point.
  • Array of ints would still have to be special in some way to know how big each element is (bit/byte).
  • Strings in jq are unicode codepoints arrays so binary data might be interpreted as multi-byte codepoints and i think there were other confusing issue making not suitable for binary, see ex: "åäö" | .,tobytes | length.

Also there is the issue what to do with some formats that has raw fields that can be very large (ex: mp4 mdat), include all or truncates somehow?

So the current compromise is that raw will be base64 string (to be jq compatible) and also be truncated by default. It is possible change the tovalue truncate behaviour with -o bits_format=base64 (there is also md5).

Sorry for the long rant :) but it's very good someone else is messing around with this as i'm not that happy with the current design and i think it can be made better and less confusing, so feedback is very welcome.

BTW fq has support for "binary arrays" (similar to iolists in erlang) so you slice and concatenate parts into a new binary. Maybe a not very good example:

# build a binary array with a bytes (0), a binary slice and a string (will be utf8 bytes) and try to decode it as s bplist (force to skip magic check).
$ go run . -n '"hello" | tobits | [0, .[8:16], "a string"] | tobytes | bplist({force: true}) | d'
   │00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f│0123456789abcdef│.{}: (bplist)
   │                                               │                │  error: bplist: SeekAbs: failed at position 8 (read size 0 seek pos 0): invalid seek offset
   │                                               │                │  header{}:
0x0│00 65 61 20 73 74                              │.ea st          │    magic: "\x00ea st" (invalid)
0x0│                  72 69                        │      ri        │    version: "ri" (invalid)
0x0│                        6e 67│                 │        ng│     │  unknown0: raw bits

wader avatar Sep 24 '22 09:09 wader

Handling timestamps is tricky. On the one hand, it's tempting to render it as the description by default, since that is the behavior of plutil and presents the value clearly to the user. On the other hand, jq does have functions for converting timestamps between formats from the raw value. I think the best solution for now is to be true to the original format and render the value as the decoded floating point value, and access the description using todescription as you suggested, that way we stay consistent with the jq way of doing things. I've added a note on this in the format documentation.

dgmcdona avatar Sep 24 '22 17:09 dgmcdona

singletons seems to produce unknown fields:

➜  fq git:(bplist) ✗ go run . -o line_bytes=10 'grep_by(.type=="singleton"), .unknown0, .unknown1 | dv' /Users/wader//Library/Containers/com.apple.LoginUserService/Container.plist
      │00 01 02 03 04 05 06 07 08 09│0123456789│.objects.entries[3].value.entries[5].value.entries[0].value{}: 0x475b-0x475b.7 (1)
0x4754│                     09      │       .  │  type: "singleton" (0) (Singleton value (null/bool)) 0x475b-0x475b.3 (0.4)
      │                             │          │  value: true 0x475c-NA (0)
      │00 01 02 03 04 05 06 07 08 09│0123456789│.objects.entries[3].value.entries[5].value.entries[1].value{}: 0x475b-0x475b.7 (1)
0x4754│                     09      │       .  │  type: "singleton" (0) (Singleton value (null/bool)) 0x475b-0x475b.3 (0.4)
      │                             │          │  value: true 0x475c-NA (0)
      │00 01 02 03 04 05 06 07 08 09│0123456789│
0x4754│                     09      │       .  │.unknown0: raw bits 0x475b.4-0x475b.7 (0.4)
      │00 01 02 03 04 05 06 07 08 09│0123456789│
0x4754│                        09   │        . │.unknown1: raw bits 0x475c-0x475c.7 (1)

Looks like 12 bits after the type is seen as unknown (should be one unknown field but seems there is a bug in the gap code, maybe because of synthetic fields, will have a look).

Also was a bit confused first until i realized that bplist can apparently use the index multiple times, quite cool.

wader avatar Sep 25 '22 10:09 wader

Range gap issue fixed in master https://github.com/wader/fq/pull/431, now unknown field shows up as one 12 bit field.

But i got a bit unsure if the gap we're seeing in the example above is correct or not? does "normal" bplist have them? i guess based on how bplist uses offsets tables it would be possible to have ranges that are unused/unknown but does that happen in practice? maybe because alignment etc?

Generally i've tried to make decoders behave so that they ends up with gaps only for things they don't know about/should not be there, ex unknown trailing data.

wader avatar Sep 25 '22 14:09 wader

With https://github.com/wader/fq/pull/432 toactual and tosym behave the same as tovalue. Also fixed the error. I think that make sense?

The code that handles this is starting to get a bit out of hand, badly needs a rethink/refactor

wader avatar Sep 25 '22 15:09 wader

Im thinking about releasing 0.0.10 soonish, would be nice to include this. If your busy we can merge and i can try fix the remaning things if you like?

wader avatar Sep 28 '22 09:09 wader

Im thinking about releasing 0.0.10 soonish, would be nice to include this. If your busy we can merge and i can try fix the remaning things if you like?

Sorry, I've been drowning in work and thesis so I haven't had time to figure out those last unknown bytes. If you want to merge it for the next release, I'm happy to figure out the bugs when I get a chance, or you can if you have the time.

dgmcdona avatar Sep 29 '22 21:09 dgmcdona

Sorry, I've been drowning in work and thesis so I haven't had time to figure out those last unknown bytes. If you want to merge it for the next release, I'm happy to figure out the bugs when I get a chance, or you can if you have the time.

No need to be sorry, focus on thesis! what is it about?

Ok i'll let you know here if i figure something out

wader avatar Sep 29 '22 21:09 wader

Hey, this should fix the unknown field for singletons:

diff --git a/format/bplist/bplist.go b/format/bplist/bplist.go
index 7c74848c..83480d0d 100644
--- a/format/bplist/bplist.go
+++ b/format/bplist/bplist.go
@@ -91,15 +91,11 @@ func decodeItem(d *decode.D, p *plist) {
        m := d.FieldU4("type", elementTypeMap)
        switch m {
        case elementTypeNullOrBoolOrFill:
-               t := d.U4()
-               switch t {
-               case null:
-                       d.FieldValueNil("value")
-               case boolTrue:
-                       d.FieldValueBool("value", true)
-               case boolFalse:
-                       d.FieldValueBool("value", false)
-               }
+               d.FieldU4("value", scalar.UToScalar{
+                       null:      scalar.S{Sym: nil},
+                       boolTrue:  scalar.S{Sym: true},
+                       boolFalse: scalar.S{Sym: false},
+               })
        case elementTypeInt:
                n := d.FieldUFn("size", func(d *decode.D) uint64 {
                        return 1 << d.U4()

wader avatar Sep 30 '22 10:09 wader

Maybe left is to add some torepr test, rebase on master and regenerate documentation and test actual output.

After that i think were done or something more you want to do?

wader avatar Sep 30 '22 10:09 wader

Let's merge and i fix the things in master

wader avatar Oct 04 '22 12:10 wader

Thanks a lot for your contribution 🥳 Hope the decode API was ok to work with and that you might want to add more formats etc in the future!

wader avatar Oct 04 '22 12:10 wader

Hmm noticed now that the bplist commits don't have your github email. Feel free to do some dummy PR if you want to show up as a contributor to the project.

wader avatar Oct 04 '22 12:10 wader

@dgmcdona Hey, got your message but can't see it here? strange.

Anyways, no problem and totally understand! was so little left and wanted it part of 0.0.10 :) yes i'm looking forward to future contributions and feel free to email me or open issues if you have ideas or want to discuss something, ex how fq could be used in forensics.

What kind of formats are common CF-formats? filesystems etc? I haven't used fq for that but it is designed to handle broken files and i try divide formats into smaller "subformats" to make it possible to decode parts separately etc. For example i've used fq quite a lot to search for patterns then try to decode and filter something at each match, as jq is a generator based language it is quite ergonomic to do, ex try to decode each occurrence for 0xfff8 as a FLAC frame:

tobytes as $b | scan([0xff,0xf8]) | $b[.start:] | flac_frame

Would be really great of someone else experimented with things like that as the functions and behaviors of the binary type in fq is a bit strange at times and i don't really know myself how i would like it to work :)

wader avatar Oct 05 '22 17:10 wader