formulog icon indicating copy to clipboard operation
formulog copied to clipboard

How should I represent user-defined types that are SMT bit-vectors

Open bubaflub opened this issue 6 months ago • 5 comments

I'd like to create user-defined types to represent IP addresses. Specifically, a bv[32] to represent an IPv4 address and a bv[128] to represent an IPv6 address. Since i32 is interchangeable with bv[32] I'm currently just doing:

type ipv4address = i32

And then I can parse an IPv4 address from a string to a 32-bit integer:

fun parseIPv4Address(address: string) : ipv4address option = ...

rel ipv4addresses(string, ipv4address option)
ipv4addresses("11.2.3.4", parseIPv4Address("11.2.3.4")).

And when I run with --dump-all I do see this relation:

ipv4addresses("11.2.3.4", some(184681220))

Where 184681220 is the decimal representation of that IPv4 address.

However, this won't work for IPv6 addresses since there is no corresponding native 128-bit type. Additionally, I'd like to extend this to represent CIDR ranges / subnetworks. For example, the CIDR 192.168.0.0/16 represents the range of IP addresses from 192.168.0.0 to 192.168.255.255. I'd like to represent this as an SMT bit vector where the first 16-bits are fixed (has the values from the first two octets, 192 and 168) and the last 16-bits are free. This representation makes it very easy to check if a (concrete) IP address is in the range of a CIDR. And I'll be using IP addresses and CIDRs to build out models of firewalls and routing tables.

So my questions are:

  1. How can I define a user-type that is just an arbitrary-length SMT bit-vector?
  2. Once that's done, I'll need functions that parse IP address (or CIDR) strings to their binary representations. I'll also need functions that can check if a particular IP address is within the CIDR range. Can these just functions written in ML+SMT? Or do I need something else to manipulate these bit-vectors?

bubaflub avatar Jan 03 '24 18:01 bubaflub

You can use the bv constructor---symbolic evaluation has an example.

But per your second question, you'll be limited in how you work with such a bitvector: it's in SMT, so it's possibly symbolic. If you made IP addresses bitvectors, you'd have to work with them in SMT. There's an example of this later in the same file.

mgree avatar Jan 04 '24 19:01 mgree

Great, thanks for links to the examples. I'll give it a shot and report back.

Could clarify more about:

you'll be limited in how you work with such a bitvector

What are the limitations when using a symbolic value? Am I prevented from using (possibly) symbolic values in datalog relations and queries?

Taking a step back, what I'd like to ultimately accomplish is to represent a large number of IP addresses and CIDRs in my program and query if an IP address is contained by a CIDR. I'd like to combine those building blocks to model a network firewall where allowing or denying a network connection depends on the source and destination IP CIDRs and a list of allow/deny rules matching CIDRs. Then a (very basic) model of network reachability is finding a path from A to B through any number of firewalls that would accept or drop the connection.

I was thinking that representing IP addresses as symbolic bitvectors would be a natural way to represent this, but I suppose I could also treat these IP addresses as (binary) numbers and a CIDR as a range between two concrete values. That's not exactly correct but works for my purposes.

bubaflub avatar Jan 10 '24 22:01 bubaflub

Right: you can't look at the "value" of a bv[k] smt without first getting a model---and then it's just the value in that model. You're free to use such a value to build up larger symbolic values, but they X smt has no X value until you have a model.

I would be tempted to go with a more explicit representation, i.e., an IP address is four concrete ints, and a CIDR is a datatype that for each class, holds the information you need (e.g., CIDR_ClassA just holds a single int, representing just the 8 bits of that class A network; CIDR_ClassB holds 2 ints, and so on). You could define matching explicitly by doing appropriate masking in a functional program. If you then wanted to reason more symbolically about these, you could also define CIDR matching over SMT terms.

If that doesn't make sense, I'd be happy to meet---probably a lot clearer that way. 😁

mgree avatar Jan 17 '24 15:01 mgree

@mgree Just to clarify: do you mean i32 (Formulog's type for a concrete, 32-bit machine integer) instead of int (Formulog's type for an SMT math integer)?

aaronbembenek avatar Jan 24 '24 03:01 aaronbembenek

Whoops, yes, I do, thank you!

mgree avatar Feb 01 '24 16:02 mgree