polypyus icon indicating copy to clipboard operation
polypyus copied to clipboard

Add binary, remove, and comment arguments to cli

Open mariusmue opened this issue 3 years ago • 8 comments

Hi again!

Besides obtaining function matches, I'd like to use Polypyus to catalogue binaries tied to a specific project. Hence, a couple of features which are outside the traditional use cases would be nice, and this PR includes cli-options to:

  • specify a binary in the database to perform actions on (without invoking the analysis)
  • add custom comments to a specific binary
  • remove a binary from the database

This could be extended to other binary-specific actions, I was thinking about adding custom-loaders for packed/non-flat binaries later on along the road. In any case, there are certain caveats/things which should be discussed before merging in this PR already:

  1. This PR assumes the name of the binaries are unique - is this a valid assumption to make?
  2. The addition of a comment field to the Binary Model seems to break compability to existing databases. Is this okay for you/should there be a version-bump to indicate missing backwards-compability?
  3. Removal via Binary.delete() does not seem to "clean" the sqlite database, the filesize of the project-db remains unchanged after deletion. Do you have any idea how this could be fixed?
  4. The current approach result into a convoluted analyze function, which carries out different tasks than analyzing. Do you prefer to have command subgroups, or is the --binary approach fine for you?

I'm happy to discuss those things and change the PR according to the outcome.

Cheers, Marius

mariusmue avatar Mar 08 '21 23:03 mariusmue

I added an additional example of binary functionality: export of csv for that specific binary (regardless whether it's a history file or a target binary).

Having used polypyus from CLI a bit more, I would indeed suggest to split into a binary and a analyze sub-command group. When doing analysis, I think it should not be necessary to specify annotations for the history again, instead a target and a set of histories should be enough?

In any case, if I get approval from your side, I can take a stub on a reworked CLI interface and push it here.

mariusmue avatar Mar 09 '21 14:03 mariusmue

I think the CLI could use a rework. I am not sure about the suggested changes right now but I like the idea of creating sub-commands. I need to think about this a bit more. Might have time this weekend to spend time on this.

One possibility would be to introduce commands like

  • pp get binaries (list of all binaries without detail)
  • pp describe binary [name] (details of binary with name)
  • pp get histories
  • pp describe history [name]
  • pp get annotations [history name] (list of all annotations)
  • pp add binary [path]
  • pp add annotation [binary name] [path]
  • pp add comment [binary name] [comment]
  • pp get comments [binary name]
  • pp rm comment [comment id]
  • pp rm binary [name]
  • pp rm annotation [name]

What do you think? What is the use case for you? What features are still missing? This is meant for brianstorming . . . binary names are not unique

FreebeJan avatar Mar 09 '21 21:03 FreebeJan

To the remarks:

  1. Binary names are not unique. the paths are . . . there is however an ID that could be used.
  2. It should be possible to create a new Comment model that references the Binary model (Foreignkey) without breaking compatibility. PonyORM does not have a buildin database migration support as far as I can remember.
  3. That would be a bug. I will try to reproduce it when I have time and create an Issue/fix it if it is obvious.
  4. See my previous comment

FreebeJan avatar Mar 09 '21 22:03 FreebeJan

Hi, thanks for the answer:

  1. personally, from a CLI point of view, I think having unique names per binary (and history) may be more convenient. That being said, I can also imagine just adding additional logic to the CLI in case Binary.get() returns more than one binary when fetched via name.

  2. I can look into it. Right now, this PR just adds an additional comment = Optional(str) to the Binary class. From your notes I see, that you think a Set() (similar to annotations and functions) may be better. in case it's directly embedded? In any case, wouldn't a foreign-key approach unnecessarily complicate the database-model and queries? I wonder how much effort it is to create a second script which either modifies existing project databases to just add this field, or exports/imports the Binary objects across versions of polypyus. Of course, if db-incompability is a no-go for you, I won't object further and create the new Model. :)

  3. I can create a smoketest for this bug and upload it here, if it helps you for debugging?

  4. I like the the differentiation in subcommand groups! I would probably differentiate get in list and info, according whether it should operate on a single object, or give summarized infos about multiple. Other than that, add binary should maybe have a --target flag? Lastly, a analyze command group taking multiple history-names as input and a target binary should do the actual analysis.

Regarding features for my usecase, the other things I would need are:

  • Export of CSV data from CLI, also for history-binaries (included in this PR)
  • ghidra importer/exporter (I can write those)
  • Some sort of blob loading for selected binaries before . My firmware images are loaded dynamically at different locations in memory, and the symbols I have are for after this loading process. I tried starting implementing this, but have no idea where I can hook best to add this funcitonality; pointers would be appreciated.
  • Support for MIPS. From what I gather, I would need to extend the FunctionMode enum to include MIPS64/32/16 and extend the annotation_parser.py. Is this correct, or is the information about the mode used somewhere in the analysis backend?
  • Fully optional: a optional date field for the binaries, to indicate when this binary was obtained/produced by the manufacturer. (This can be solved via comments, however.)

mariusmue avatar Mar 10 '21 08:03 mariusmue

I resolved potential merge conflicts; In any case, this PR by now is convuloted with multiple different features. do you want me to split this up in multiple PRs? Alternatively, I can also just go completely out of tree and may contribute back in a bulk later, whatever you prefer.

In any case, are there any answers to the question w.r.t. my usecase?

mariusmue avatar Mar 17 '21 12:03 mariusmue

I am working on a new command line interface. Will create a pull request soonish and set you as reviewer so you can check whether this solves at least point 4.

To the other points: 2. Adding an additional table would circumvent the database incompatibility as no existing tables would need to change. Querries would be unaffected besides the ones that need to display the comment. Also this table could be called binary_metadata and also hold the information for date that you would like. 3. A reproducible test for the bug would be very helpful 👍

Support for MIPS. From what I gather, I would need to extend the FunctionMode enum to include MIPS64/32/16 and extend the annotation_parser.py. Is this correct, or is the information about the mode used somewhere in the analysis backend?

There is some implicit assumption in the code about the density of (ARM) code words in a uniformly random stream of bytes and some conciderations as to the likelihood of marking some bytes as code that really are data but look like code. This is used to make some informed choices when creating fuzzy matchers of function groups, see models.py and tools.py . Truth be told, I think it makes sense for me to do a little refactor there and also to include a guide to define the right values / write another cost function to make it easier to add another instruction set

FreebeJan avatar Mar 18 '21 11:03 FreebeJan

Hi, here is a list of open tasks, which were mentioned in this pull-request.

  • [ ] comment field for binaries
  • [ ] MIPS support

CLI command line:

  • [ ] pp get binaries (list of all binaries without detail)
  • [ ] pp describe binary [name] (details of binary with name)
  • [ ] pp get histories
  • [ ] pp describe history [name]
  • [ ] pp get annotations [history name] (list of all annotations)
  • [ ] pp add binary [path]
  • [ ] pp add annotation [binary name] [path]
  • [ ] pp add comment [binary name] [comment]
  • [ ] pp get comments [binary name]
  • [ ] pp rm comment [comment id]
  • [ ] pp rm binary [name]
  • [ ] pp rm annotation [name]
  • [ ] pp export [binary name] [path]

Cheers, Florian

FlorianKosterhon avatar Jun 01 '21 13:06 FlorianKosterhon

Thanks for summarizing/listing all the points which came up in the discussion!

What is the course of action here? Am I expected to implement all of those? There is still the changes requested mark (although I did follow the according change).

mariusmue avatar Jun 01 '21 14:06 mariusmue