roc icon indicating copy to clipboard operation
roc copied to clipboard

AssocList for dictionaries

Open Qqwy opened this issue 3 years ago • 16 comments

Work-in-progress implementation of an association list, which can become the foundation of a 'pure Roc' Dict implementation, as well as being an example of a custom datastructure. Fixes (at least partially) #3245

  • [x] Basic implementation of most of the interface.
  • [x] ~~union, intersection, difference.~~ insertAll.
  • [x] A runnable example to manually test whether the code is correct.
  • [x] Actual tests.

Qqwy avatar Jun 17 '22 19:06 Qqwy

Question: Currently AssocList is an alias of a normal List [Pair k v]. Alternatively we could wrap it with an extra tag to hide that it in a list.

Maybe it makes more sense to do this when wrapping it as a Dict (because there for sure the internals need to be opaque as we want to change it in the future), which is why I have not done it here right now.

This does immediately beg the question: Should we have a separate e.g. len in the AssocList module or tell people to just use List.len?

Qqwy avatar Jun 17 '22 19:06 Qqwy

Should we have a separate e.g. len in the AssocList module or tell people to just use List.len?

Let's do the AssocList.len function. šŸ‘

Alternatively we could wrap it with an extra tag to hide that it in a list.

This is what opaque types are for! I haven't written a tutorial on them yet, but basically here's how to use the feature:

AssocList k v := List [Pair k v]

The := is similar to Haskell's newtype; this makes AssocList a nominal type, which means that unlike a type alias, if you try to compare an AssocList k v to a List [Pair k v], you'll get a type mismatch.

You can create instances of it by applying @AssocList, and you can destructure them by using @AssocList in a pattern match. For example:

insertFresh : AssocList k v, k, v -> AssocList k v
insertFresh = \@AssocList list, k, v ->
    List.append list (Pair k v)
        |> @AssocList

This @ syntax is only allowed in the same scope where the := was used to define the AssocList opaque type. So since you defined it at the top level of the AssocList module, other modules won't be able to destructure an AssocList with @AssocList or create new instances with it, even if you expose the AssocList type to them.

rtfeldman avatar Jun 17 '22 21:06 rtfeldman

FYI, the reason CI is failing is that we have a check that everything in the examples directory has to have a test verifying its behavior in some way; that way, we don't accidentally end up with examples that don't work, which is a frustrating experience for beginners. šŸ˜„

I can show you how to add that test when you're feeling this is ready to merge!

rtfeldman avatar Jun 17 '22 21:06 rtfeldman

I have a question about union, intersection and difference.

  • union as bs: For each key that exists in either as or bs, store it with its value in the result. On conflict, keep the value from bs.
  • intersection as bs: For each key that exists in both as and bs, store this key with the value from as (?) in the result.
  • difference as bs: For each key that exists in as but not in bs, store this key with the value from as in the result.

Are these the expected semantics?

I can show you how to add that test when you're feeling this is ready to merge!

I would love to have some guidance for this. :blush:

Qqwy avatar Jun 21 '22 09:06 Qqwy

I've hit a snag by the way: To support O(n * log(n)) union/intersection/difference it is necessary to sort the elements inside the AssocList.

This is however not possible unless Pair a b can be sorted. I presume that this problem would be solved once we introduce the Hash constraint. But what should we do until that part is ready?

I see two options:

  1. Do not implement union/intersection/difference for now
  2. Use different signatures for these functions, constrained on e.g. Num, which will change in the future.
  3. Use a naive O(n²) algorithm that works for all datatypes (that support ==).

Qqwy avatar Jun 21 '22 10:06 Qqwy

Are these the expected semantics?

Yes, exactly! šŸ’Æ

Use a naive O(n²) algorithm that works for all datatypes

This is fine for now - we can revisit when we get to the version that actually uses a hashmap under the hood. šŸ‘

Ping me on Zulip and we can chat about adding the test!

rtfeldman avatar Jun 21 '22 11:06 rtfeldman

Actually, here's a simplification: let's do what this package does here, namely:

  • Don't include intersection and diff in the API after all
  • Rename union to insertAll

rtfeldman avatar Jun 21 '22 11:06 rtfeldman

All right, I have implemented insertAll. I took the liberty to include a normalizeWith to allow people to compare AssocLists for equality if they really want/need to, requiring them to come up with a sorter function themselves (just like List.sortWith).

Qqwy avatar Jun 21 '22 13:06 Qqwy

@rtfeldman I am encountering a couple of weird bugs in Roc itself, which make it difficult to make the example more involved at this time.

The current example code wants you to build two AssocLists by entering pairs of lines. The Roc runtime is mismanaging the recursion, however.

Should I make a separate issue for this problem or not?


Example input:

1
a
2
b
3
c

4
d
5
e
6
f

Program invocation:

cat examples/assoc-list/example_input.txt | cargo run -- run examples/assoc-list/main.roc

Expected output:

    Finished dev [unoptimized + debuginfo] target(s) in 0.27s
     Running `target/debug/roc run examples/assoc-list/main.roc`
šŸ”Ø Rebuilding host...
This example program takes the AssocList interface for a spin.
Input pairs of lines.
 Each pair will become an association in the first AssocList.
Finish by inputting an empty line.
Fist association list:
1 => a
2 => b
3 => c
Input pairs of lines.
 Each pair will become an association in the second AssocList.
Finish by inputting an empty line.
Second association list:
4 => d
5 => e
6 => f

Actual output:

    Finished dev [unoptimized + debuginfo] target(s) in 0.27s
     Running `target/debug/roc run examples/assoc-list/main.roc`
šŸ”Ø Rebuilding host...
This example program takes the AssocList interface for a spin.
Input pairs of lines.
 Each pair will become an association in the first AssocList.
Finish by inputting an empty line.
Fist association list:
1 => a
Input pairs of lines.
 Each pair will become an association in the second AssocList.
Finish by inputting an empty line.
Second association list:
4 => d
5 => e
6 => f
2 => b
Input pairs of lines.
 Each pair will become an association in the second AssocList.
Finish by inputting an empty line.
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/lib.rs:116:45
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
Aborted (core dumped)

Qqwy avatar Jun 22 '22 07:06 Qqwy

A separate issue would be great! Ideally with a smaller reproduction case. šŸ˜„

rtfeldman avatar Jun 22 '22 15:06 rtfeldman

This PR is as good as finished. However, it is blocked by the bug #3351 right now. (The tests are failing because of this bug's symptoms)

Qqwy avatar Jun 27 '22 20:06 Qqwy

@rtfeldman This is currently still blocked by the List.walk memory corruption problem that breaks the test example.

The only function that is affected by this issue, is AssocList.remove.

Maybe it makes sense to comment out that function for now, allowing the rest of AssocList to already be used?

Qqwy avatar Jul 07 '22 20:07 Qqwy

Maybe it makes sense to comment out that function for now, allowing the rest of AssocList to already be used?

Yeah I like that idea! šŸ‘

rtfeldman avatar Jul 07 '22 20:07 rtfeldman

to replace current Set, we will need union and difference. These are used in our examples (e.g. AStar)

folkertdev avatar Jul 08 '22 12:07 folkertdev

to replace current Set, we will need union and difference. These are used in our examples (e.g. AStar)

union is implemented under the name insertAll (indicating that it has a bad time complexity; see discussion above). difference could be implemented similarly on top of remove as, say, removeAll (with similarly bad time complexity). However, this is blocked by #3351.

Qqwy avatar Jul 08 '22 15:07 Qqwy

PR has been unblocked as the issue was resolved!

Qqwy avatar Jul 12 '22 21:07 Qqwy

@Qqwy can we close this since your AssocList implementation landed in-line for the stdlib Dict?

ayazhafiz avatar Oct 08 '22 19:10 ayazhafiz

Yes! I would do it myself but I'm out of town right now and only have access to my PC tomorrow evening again, so feel free to do it earlier

On October 8, 2022 7:34:00 PM UTC, Ayaz @.> wrote: @. can we close this since your AssocList implementation landed in-line for the stdlib Dict?

-- Reply to this email directly or view it on GitHub: https://github.com/roc-lang/roc/pull/3258#issuecomment-1272384119 You are receiving this because you were mentioned.

Message ID: @.***> -- Met vriendelijke groet / Sincerely, ~W-M

Qqwy avatar Oct 08 '22 19:10 Qqwy

Have fun out of town!

ayazhafiz avatar Oct 08 '22 19:10 ayazhafiz