v icon indicating copy to clipboard operation
v copied to clipboard

Datatype `Set` implementation

Open vladimirmyshkovski opened this issue 2 years ago • 12 comments

Just simple implementation of Set.

I think part of the following implementations should be added in the future:

Core set-theoretical operations

  • [x] union(S,T): returns the union of sets S and T.
  • [x] intersection(S,T): returns the intersection of sets S and T.
  • [x] difference(S,T): returns the difference of sets S and T.
  • [x] subset(S,T): a predicate that tests whether the set S is a subset of set T.

Static sets

  • [x] is_element_of(x,S): checks whether the value x is in the set S. (In our case it's called exists)
  • [x] is_empty(S): checks whether the set S is empty.
  • [x] size(S): returns the number of elements in S.
  • [ ] iterate(S): returns a function that returns one more value of S at each call, in some arbitrary order.
  • [ ] enumerate(S): returns a list containing the elements of S in some arbitrary order.
  • [ ] build(x1,x2,…,xn,): creates a set structure with values x1,x2,...,xn.
  • [ ] create_from(collection): creates a new set structure containing all the elements of the given collection or all the elements returned by the given iterator.

Dynamic sets

  • [x] create(): creates a new, initially empty set structure.
  • [ ] create_with_capacity(n): creates a new set structure, initially empty but capable of holding up to n elements.
  • [x] add(S,x): adds the element x to S, if it is not present already.
  • [x] remove(S, x): removes the element x from S, if it is present.
  • [ ] capacity(S): returns the maximum number of values that S can hold.

Additional operations

  • [x] pop(S): returns an arbitrary element of S, deleting it from S.
  • [x] pick(S): returns an arbitrary element of S. Functionally, the mutator pop can be interpreted as the pair of selectors (pick, rest), where rest returns the set consisting of all elements except for the arbitrary element. Can be interpreted in terms of iterate.
  • [ ] map(F,S): returns the set of distinct values resulting from applying function F to each element of S.
  • [ ] filter(P,S): returns the subset containing all elements of S that satisfy a given predicate P.
  • [ ] fold(A0,F,S): returns the value A|S| after applying Ai+1 := F(Ai, e) for each element e of S, for some binary operation F. F must be associative and commutative for this to be well-defined.
  • [x] clear(S): delete all elements of S.
  • [x] equal(S1', S2'): checks whether the two given sets are equal (i.e. contain all and only the same elements).
  • [ ] hash(S): returns a hash value for the static set S such that if equal(S1, S2) then hash(S1) = hash(S2)
Other operations can be defined for sets with elements of a special type:
  • [ ] sum(S): returns the sum of all elements of S for some definition of "sum". For example, over integers or reals, it may be defined as fold(0, add, S).
  • [ ] collapse(S): given a set of sets, return the union. For example, collapse({{1}, {2, 3}}) == {1, 2, 3}. May be considered a kind of sum.
  • [ ] flatten(S): given a set consisting of sets and atomic elements (elements that are not sets), returns a set whose elements are the atomic elements of the original top-level set or elements of the sets it contains. In other words, remove a level of nesting – like collapse, but allow atoms. This can be done a single time, or recursively flattening to obtain a set of only atomic elements.For example, flatten({1, {2, 3}}) == {1, 2, 3}.
  • [ ] nearest(S,x): returns the element of S that is closest in value to x (by some metric).
  • [ ] min(S), max(S): returns the minimum/maximum element of S.

vladimirmyshkovski avatar Jun 26 '22 00:06 vladimirmyshkovski

Why not just use a map to make a Set? What is/are the use(s) of using a separate type?

Hunam6 avatar Jun 26 '22 16:06 Hunam6

Why not just use a map to make a Set? What is/are the use(s) of using a separate type?

I don't know for sure how map is implemented in V, but, in a number of other languages, map is less productive than array.

Also, a number of additional functions will make this type more convenient to use in specific cases.

In addition, I saw that in the README.md package datatypes, there is a type Set, which someone considered necessary to implement. I took it as welcome to contribution :-)

vladimirmyshkovski avatar Jun 26 '22 17:06 vladimirmyshkovski

Okay 👌 Your job looks really good

Hunam6 avatar Jun 26 '22 17:06 Hunam6

Why not just use a map to make a Set? What is/are the use(s) of using a separate type?

I don't know for sure how map is implemented in V, but, in a number of other languages, map is less productive than array.

Also, a number of additional functions will make this type more convenient to use in specific cases.

In addition, I saw that in the README.md package datatypes, there is a type Set, which someone considered necessary to implement. I took it as welcome to contribution :-)

Have you tried to benchmark your solution vs map solution?

penguindark avatar Jun 27 '22 01:06 penguindark

I was going to suggest the same as the rest did. Using a map solution will be better in terms of performance! Just for reference, here there is a Set implementation I did in the past in PHP 😅 https://github.com/MBHFramework/structures/blob/385e1e3d49d8b953fa2354a6b522a3cf6de4c24f/Mbh/Collection/Set.php#L26

just for reference, using map, the set should be something like this:

pub struct Set<T> {
mut:
	table map[T]int // or map[T]bool or any type in the value
}

then Set will be just a wrapper of map when the keys are the values of the Set and most of the operations will be really efficient

ulises-jeremias avatar Jun 27 '22 01:06 ulises-jeremias

I was going to suggest the same as the rest did. Using a map solution will be better in terms of performance! Just for reference, here there is a Set implementation I did in the past in PHP sweat_smile https://github.com/MBHFramework/structures/blob/385e1e3d49d8b953fa2354a6b522a3cf6de4c24f/Mbh/Collection/Set.php#L26

just for reference, using map, the set should be something like this:

pub struct Set<T> {
mut:
	table map[T]int // or map[T]bool or any type in the value
}

then Set will be just a wrapper of map when the keys are the values of the Set and most of the operations will be really efficient

Maybe voidptr better than int?

vladimirmyshkovski avatar Jun 27 '22 17:06 vladimirmyshkovski

Maybe voidptr better than int?

voidptr can be 64bit, int is 32 vbit in V, you can save half the memory for each element. otherwise you can use a simply u8 to reduce the memory footprint.

penguindark avatar Jun 27 '22 17:06 penguindark

voidptr can be 64bit, int is 32 vbit in V, you can save half the memory for each element. otherwise you can use a simply u8 to reduce the memory footprint.

Perhaps it makes sense to take not only the type of value, but also the type of key?

pub struct Set<K, V> {
	// maximum number of values that Set<K, V> can hold.
	capacity int
mut:
	elements map[V]K
}

vladimirmyshkovski avatar Jun 27 '22 18:06 vladimirmyshkovski

Perhaps it makes sense to take not only the type of value, but also the type of key?

If this is a pure set() I don't think that the type of the value of the map has a meaning. You can simply choice the smallest one like u8.

penguindark avatar Jun 27 '22 21:06 penguindark

@vladimirmyshkovski I think there is no need of having a capacity for the Set

ulises-jeremias avatar Jun 28 '22 00:06 ulises-jeremias

@ulises-jeremias

What would be a better name for the method of adding elements from an array to a set? For example, in Python it is called update. I'm thinking about the name of the method that returns a copy of the set: copy or clone?

Also, union is a reserved name in V therefore, the method must be called something else. Right now I'm calling it union_.

vladimirmyshkovski avatar Jun 28 '22 12:06 vladimirmyshkovski

Also, union is a reserved name in V therefore, the method must be called something else. Right now I'm calling it union_.

The existing convention is to use fn ... @union(...) ... { for names that are also keywords, for example:

vlib/sync/sync_default.c.v:93:pub fn (mut m Mutex) @lock() {
vlib/net/common.v:31:fn @select(handle int, test Select, timeout time.Duration) ?bool {
vlib/net/unix/stream_nix.v:55:fn (mut s StreamSocket) @select(test Select, timeout time.Duration) ?bool {

spytheman avatar Jun 30 '22 11:06 spytheman

All tests pass now.

Merged, thanks!

medvednikov avatar Aug 16 '22 17:08 medvednikov