dino icon indicating copy to clipboard operation
dino copied to clipboard

Syntax for Statically-checked Units

Open sunjay opened this issue 6 years ago • 3 comments

Building on the work from lion, it would be great if this language had a statically-checked units system. The lion language was always just interpreted, so it will be interesting to see what we can do with a compiled language.

Unit Representation, Desugaring, & Operators

We can define a UnitValue struct that is generic over both the value type and the unit. All types containing units are desugared to UnitValue (e.g. real 'km desugars to UnitValue<'real, 'km>).

// This is dino code, not Rust code

/// All units implement this marker trait
///
/// The trait is sealed (using an internal-only Sealed trait)
/// to prevent outside implementations
trait Unit: private::Sealed {}

/// Represents a value with a statically-checked unit
struct UnitValue<T, U: Unit> {
    // private fields
    value: T,
    unit: U, // All units are zero-sized () values
}

A UnitValue can be created from a "unitless" value.

/// All unitless values can be turned into a value with a unit at no cost
impl<T, U: Unit> From<T> for UnitValue<T, U> {
    fn from(value: T) -> Self {
        Self {
            value,
            unit: /*...compiler magic: unit literal for U...*/,
        }
    }
}

You can get the raw value from a unitless ('_) value using the From trait. Any other value that has a unit can be retrieved using the value() method.

impl<T, U> UnitValue<T, U> {
    /// Explicitly get the value (without the protection of the unit)
    /// out of the united quantity
    fn value(self) -> T {
        self.value
    }
}

/// A unitless value can be converted directly into the value type
impl<T> From<UnitValue<T, '_>> for T {
    fn from(x: UnitValue<T, '_>) -> T {
        x.value()
    }
}

We can dynamically-generate impls for the From trait that convert a value with a unit into a value with a different unit.

// Dynamically-generated impl, created as-needed by the compiler
impl<T1, T2> From<UnitValue<T1, 'm>> for UnitValue<T2, 'km> {
    fn from(other: UnitValue<T1, 'm>) -> Self
        where T1: Div<T1, Output=T2> + From<real>
    {
        Self {
            value: other.value / T1::from(1000.0),
            unit: 'km, // Every unit has a constructor (similar to the () type)
        }
    }
}

Operators can be implemented like any other numeric type (over compatible units).

// impl of Add that results in the LHS unit being carried forward
impl<V1, U1, V2, U2, O> Add<UnitValue<V2, U2>> for UnitValue<V1, U1>
          // Allow the addition to result in any type
    where V1: Add<V2, Output=O>,
          // Check that the RHS unit can be converted into the LHS unit
          // This constraint signals the compiler to generate an impl
          // if needed
          UnitValue<V2, U1>: From<UnitValue<V2, U2>> {

    type Output = UnitValue<O, U1>;

    fn add(self, other: UnitValue<V2, U2>) -> Self::Output {
        let other = other.into();
        Self {
            value: self.value + other.value,
            unit: self.unit,
        }
    }
}

// similar implementation of the Sub trait

Both addition and subtraction require the LHS and RHS to be the same unit.

The Mul and Div traits are a little more complicated because they can result in compound units. Take the following functions as an example:

// 'km and 'h have no defined conversion between them, so
// we keep both units and preserve that they have been divided
fn foo1(x: real 'km, y: real 'h) -> real 'km/'h {
    x / y
}

// 'km and 'm have a conversion defined between them. Dividing
// them results in a unitless (denoted '_) value.
fn foo2(x: real 'km, y: real 'm) -> real '_ {
    x / y
}

// A unitless ('_) value can be converted into any ratio of two
// units that both have a conversion defined between them.
fn foo3(x: real 'km, y: real 'm) -> real 'km/'m {
    x / y
}

// 'km and 'm^2 do not have a conversion defined between them
// because they are not the same dimension of unit. That means
// that we can only divide the values and units, not do any
// conversion.
fn foo4(x: real 'km, y: real 'm^2) -> real 'km / 'm^2 {
    x / y
}

// 'm^3 and 'L are not the same dimension (based on exponent),
// but there is still a conversion defined between them
// (1 'm^3 == 1000 L). That allows us to divide them and get
// a simple (non-compound) unit.
fn foo5(x: real 'm^3, y: real 'L) -> real 'm^3 {
    x / y
}

// 'm and 'm^2 have no defined conversion between them.
// Dividing the units does not directly get you to 'km/'m^2.
// Thus, we need to convert either 'm or 'm^2 and find a path
// to the returned unit. In this case, one solution is
// to convert 'm to 'km and then produce a result.
fn foo6(x: real 'm, y: real 'm^2) -> real 'km / 'm^2 {
    x / y
}

// 'm and 'm^2 have no defined conversion between them.
// Dividing the units can give you 1/m or m^-1 which can
// then be converted into 'km^-1.
fn foo7(x: real 'm, y: real 'm^2) -> real 'km^-1 {
    x / y
}

// and so on...

One option to implement all of these cases (and the many cases that were missed) is to dynamically generate impls of Div and Mul for compatible units as we need them. This might get messy fast, so maybe there is something more sophisticated we can do?

Observation: Multiplying and dividing unit-ed quantities almost feels like defining a very specialized dependent type system. Maybe we can borrow lessons from dependent types and use them here? (Perhaps reading The Little Typer will help with this.)

Declaring Units & Conversions

See: https://github.com/sunjay/lion#declaration-files

sunjay avatar Oct 20 '19 22:10 sunjay

Backtracking Search for Unit Conversion

This is probably a very slow but correct algorithm. This might be a good place to start implementing, and also a place to start looking for faster ways.

Problem

  • Suppose you are given a start compound unit with atomic units a, b, c, etc. and exponents e1, e2, e3, etc.
  • Suppose you are given a target compound unit with atomic units a, b, c, etc. and exponents t1, t2, t3, etc.
  • If we allow the exponents to be zero, we can represent any unit conversion problem over all of the defined units just by using the exponents ei and to for i in [0, inf)
  • In Lion, zero-exponent units are implicit in that they're left out of the representation of CompoundUnit (this is to save on how much memory the representation uses)

Question: can we find a conversion factor that takes a value of the start unit and turns it into a value of the target unit?

We are given a table of unit conversions which map which values in a given unit are equivalent to which values in another given unit.

Algorithm: Backtracking Search

  1. If the start unit and the target unit are equal, return 1.0 (since the units are equivalent)
  2. If we've reached the recursion limit, return Error
  3. Go through the unit conversions for a, b, c, etc. until you find one that hasn't been used so far (let x be the unit we found an unused conversion for)
  4. Multiply the current unit by that unit such that x is cancelled out (note that this may require raising the conversation to some fractional power such that its exponent matches the exponent of x)
  5. Increase the recursion count and go back to step 1 with the unit from the previous step as the start unit
  6. If the previous step returns an error, continue to the next conversion for x (or for a, b, c, etc. if x is exhausted) and keep searching for a solution

Limitations:

  • to avoid cycles, each conversion is only used once. That means that there are possibly some very complex conversions that we miss.
  • This algorithm does not allow for conversions to be defined between a unit and itself (that's fine)
  • This algorithm does not allow for conversions between a unit and a different power of that same unit (that's fine)
  • This algorithm does not allow for the use of conversions defined directly across different compound units (future work?)

Final Thoughts

In general, I think it's fine to disallow conversions that break math. E.g. defining a conversion between m^3 and m wouldn't work.

This algorithm has the added benefit of giving us a way to check the defined conversions: before inserting a conversion into the graph/table, we can first find the conversion factor between the two units in the conversion. Then we can check if that conversion factor is the same as the one the user wanted us to define. If not, that conversion is unsound and should be rejected. If no conversion factor is found, we can safely insert the new factor with no issues.

We may be able to speed this algorithm up by using a BFS instead of a DFS. That way we are guaranteed to get to the shortest path and not spend too much time traversing places that won't lead anywhere. This is better given that the graph is quite big.

sunjay avatar Oct 27 '19 14:10 sunjay

Cases that make this a complex problem:

  • 1 J == 1 N m
  • 1 N = kg m s^-2
  • Etc

sunjay avatar Oct 30 '19 01:10 sunjay

Sub problem:

  • Suppose you have a unit a^e1 b^e2 c^e3 ...
  • Find all combinations of one of more parts of this unit that exist in the unit conversion table
    • E.g. the partition might look like: (a^e1 b^e2) c^e3, or it might be: (a^e1 c^e3) b^e2

Once we find the partitions in each unit, we then have to explore the problem space of how those groups match up with the groups in the target unit a^t1 b^t2 c^t3 ...

Here, "match" means that a conversion exists between them.

Example: we may find that one of the partitions of the target unit is: a^t1 (b^t2 c^t3). That is, both a^t1 and b^t2 c^t3 exist as entries in the conversion table. Once we have this partition, we may find that there is a match with the (a^e1 c^e3) b^e2 partition of the original unit. A match means that there exists a bijective mapping between the sub units in each partition. It could be that this means that a^e1 c^e3 converts to a^t1 and b^e2 converts to b^t2 c^t3. Or it could be the opposite pairings.

The combinatorial explosion in this algorithm is immense, but I think there will end up being a lot of really clear ways to optimize it.

sunjay avatar Oct 30 '19 01:10 sunjay