rust exercise util: generated tests should unpack input when possible

First special rule: we need special handling for structured canonical data: in the case of the input map, it should not represent it as a hashmap, but unpack it into the input tuple sent to process_X_case.

That is, using abbreviate as an example, the generated code should not be:

process_abbreviate_case(
    {
        let mut hm = ::std::collections::HashMap::new();
        hm.insert("phrase", "GNU Image Manipulation Program");
        hm
    },
    "GIMP",
);

but instead:

process_abbreviate_case(
    ("GNU Image Manipulation Program",),
    "GIMP",
);

Second special rule: when the tuple sent to input contains exactly one item, it doesn't need to be a tuple, but can be a bare item. This means that the same generated code for abbreviate would actually be:

process_abbreviate_case(
    "GNU Image Manipulation Program",
    "GIMP",
);

Oct 20 '18 12:10 coriolinus

Currently working on this and there are some questions:

If the reason for this issue is to provide more readable input parameters, shouldn't expected values also adhere to the above rules? And if so, than maybe we could stop using hashmaps at all, instead unpacking every instance of JSON object.
If the input map contains another map, should it also be recursively unpacked into a tuple?

Nov 13 '18 09:11 ZapAnton

shouldn't expected values also adhere to the above rules?

I have no objection to that.

maybe we could stop using hashmaps at all

We can't go that far, unfortunately. The problem is that the semantics of input objects in the canonical data file are not perfectly consistent: sometimes they're struct-like (i.e. rest-api), but sometimes they're actually map-like (i.e. expected in alphametics).

It's fine to use tuples for structs; tuples pretty much are just anonymous structs anway. They fail badly when the input or expected field actually has map semantics though.

In the original problem generator, I just kind of punted on the problem and used hashes all the time, because you need non-trivial heuristics to figure out whether struct semantics or map semantics are expected. Still, if you want to take a shot at implementing this properly, here are the heuristics I'd recommend:

for any given "input" or "expected" field, you have to look at all cases which test the same property
if the field ever varies in cardinality (i.e. one instance has 2 k-v pairs, and one has 3), then it probably has map semantics
if the field's value types ever differ from each other (i.e. affine-cipher input has "phrase" as a string and "key" as a struct), then it probably has struct semantics

Once you've figured out what the semantics probably are, you can just encode maplike fields as hashmaps and structlike fields as tuples.

However, if you want to really go above and beyond, you could automatically generate appropriate structs in lib.rs. The naming convention would be {property}Input or {property}Output, so again using affine-cipher as an example, you might end up with definitions like this:

pub struct EncodeInput {
    pub phrase: String,
    pub key: EncodeInputKey,
}

pub struct EncodeInputKey {
    pub a: u64,
    pub b: u64,
}

You could then just emit appropriate struct literals. This is most like what a human might write, and it's easiest to understand; the only problem is that it's a lot more complicated to get the code right to actually accomplish this. This is therefore a bonus objective.

If the input map contains another map, should it also be recursively unpacked into a tuple?

Yes, I think so. The tuples should not be flattened, but nesting tuples appropriately makes sense. The hard part is applying the semantic-detection heuristics appropriately and recursively. For affine-cipher, it makes sense to use type EncodeInput = (String, (u64, u64));; for circular-buffer, that wouldn't be possible.

Nov 13 '18 09:11 coriolinus

exercise util has been deleted since, see this thread for more context.

Sep 11 '23 21:09 senekor