stdlib Abstract base class for map types

Description This issue should track a potential addition of a map class to stdlib. I think it would be preferable to first define an abstract base class which covers most of our needs. Here are some points I found important when working on TOML Fortran:

data structure owns data
- memory safety
- easy deep copies
- no cyclic references
get / find an element based on a key
- returns pointer to value inside structure
add / push back an element for a key
- value must be allocatable
delete / drop an element at a given key
- should value (optionally) be returned on delete / drop?
get a list of all keys / provide an iterator over all keys
keys can be strings
- what about integer, real, ... values?

Examples

Abstract base class used in TOML Fortran to define maps

type, abstract :: map_class
contains
  !> Find a value based on its key
  procedure(find), deferred :: find
  !> Push back a value to the structure
  procedure(push_back), deferred :: push_back
  !> Get list of all keys in the structure
  procedure(get_keys), deferred :: get_keys
  !> Delete a value at a given key
  procedure(delete), deferred :: delete
  !> Destroy the data structure
  procedure(destroy), deferred :: destroy
end type

abstract interface
  !> Find a value based on its key
  subroutine find(self, key, ptr)
    import :: map_class, value_type
    !> Instance of the structure
    class(map_class), intent(inout), target :: self
    !> Key to the value
    character(len=*), intent(in) :: key
    !> Pointer to the stored value at given key
    type(value_type), pointer, intent(out) :: ptr
  end subroutine find
  !> Push back a value to the structure
  subroutine push_back(self, val)
    import :: map_class, value_type
    !> Instance of the structure
    class(map_class), intent(inout), target :: self
    !> Value to be stored
    type(value_type), allocatable, intent(inout) :: val
  end subroutine push_back
  !> Get list of all keys in the structure
  subroutine get_keys(self, list)
    import :: map_class, string_type
    !> Instance of the structure
    class(map_class), intent(inout), target :: self
    !> List of all keys
    type(string_type), allocatable, intent(out) :: list(:)
  end subroutine get_keys
  !> Delete a value at a given key
  subroutine delete(self, key)
     import :: map_class, value_type
     !> Instance of the structure
     class(map_class), intent(inout), target :: self
     !> Key to the value
     character(len=*), intent(in) :: key
  end subroutine delete
  !> Deconstructor for data structure
  subroutine destroy(self)
     import :: map_class
     !> Instance of the structure
     class(map_class), intent(inout), target :: self
  end subroutine destroy
end interface

Jul 30 '21 12:07 awvwgk

Is map the same as dictionary (e.g. Python's dict) or do they differ in some ways?

In Python I work with this a lot. In Fortran, not really. But this is one of the features that I think, even if not presently used a lot, would open up many possibilities for development of high level libraries. For example, databases (SQL-style or otherwise), various file formats, parsing HTML, SVG, and other kinds of XML, HTTP clients and servers.

Fortran-native approach is the way to go.

An alternative, perhaps for its own package rather than stdlib, would be Fortran bindings for Redis, which provides high performance in-memory dicts and lists.

Aug 25 '21 15:08 milancurcic

I have worked on a Fortran interface to Redis, but I ran into odd problems that I simply did not understand. I would have to look up my notes, but if there is interest I can revive it ;).

Op wo 25 aug. 2021 om 17:00 schreef Milan Curcic @.***>:

Is map the same as dictionary (e.g. Python's dict) or do they differ in some ways?

In Python I work with this a lot. In Fortran, not really. But this is one of the features that I think, even if not presently used a lot, would open up many possibilities for development of high level libraries. For example, databases (SQL-style or otherwise), various file formats, parsing HTML, SVG, and other kinds of XML, HTTP clients and servers.

Fortran-native approach is the way to go.

An alternative, perhaps for its own package rather than stdlib, would be Fortran bindings for Redis https://redis.io/, which provides high performance in-memory dicts and lists.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/479#issuecomment-905582297, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR7BPOKEAPHPCNWKYEDT6UAQFANCNFSM5BIOAAUQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

Aug 26 '21 06:08 arjenmarkus

I was thinking C++ ordered map, but Python dictionary is the same thing.

Having an abstract base class would allow to implement the map type with any backend, stdlib would only provide the base class and pure Fortran implementations. Other (fpm) projects could extend the base class and provide their own optimized implementation.

The advantage of a single well-designed abstract base class would be that projects like TOML Fortran don't have to implement their own, but you could simply load a TOML document into a stdlib map type and use it with any other library which supports stdlib's maps. Designed correctly a library could also load data directly into an optimized map implementation.

Aug 26 '21 08:08 awvwgk

For the record, are you aware of fhash?

Oct 04 '21 06:10 FObermaier

FWIW I have just released a proposed hash map API on the website with my proposed hash function API. The API defines three modules: stdlib_32_bit_key_data_wrapper, stdlib_chaining_hash_map and stdlib_open_hash_map.

The stdlib_32_bit_key_data_wrapper module does the following: defines an abstract interface for hash functions, provides a wrapper for some hash functions so they match that interface; defines a key datatype, key_type whose contents are currently not private, but can be made abstract with setters and getters that take INT8 vectors and character strings as arguments; defines another datatype, other_type, for data that supplements the key, whose contents are currently not private, but can be made abstract with setters and getters that take INT8 vectors and character strings as arguments.

The module stdlib_chaining_hash_map implements a datatype, chaining_hash_map_type, for a simple separate chaining hash map. The API provides most of the functionality @awvwgk requests except it doesn't provide a procedure for returning a list of all keys, but that procedure is easy to implement.

The module stdlib_open_hash_map implements a datatype, open_hash_map_type, for a simple open addressing hash map with linear insertion. The API provides most of the functionality @awvwgk requests except it doesn't provide for returning a list of all keys, but that is easy to implement, and it doesn't provide a procedure for deleting elements. It is possible to provide an element deletion procedure for an open addressing hash map, but it is tricky to implement and the runtime cost is significantly higher than for a chaining hash map.

The API's for the two data types are very similar. The init_map procedure for open_hash_map_type has an additional load_factor argument missing from the chaining_hash_map_type. The open_hash_map_type also has an additional load_factor inquiry function. The chaining_hash_map_type also has the remove_entry subroutine that open_hash_map_type lacks.

The two APIs also have some inquiry functions on the structure and history of the hash maps that could be eliminated. In particular I think the total_depth function is less useful to casual users.

Oct 04 '21 21:10 wclodius2

FWIW I have just changed the implementation of key_type and other_type to be opaque with getters and setters and have updated hash_maps.md accordingly.

Oct 05 '21 22:10 wclodius2

For the record, are you aware of fhash?

There's also my own implementation of the same name: LKedward/fhash. (My focus for that project was mostly on providing a nice simple API without preprocessing - the underlying implementation is nothing special.)

Oct 06 '21 07:10 LKedward

Would it make sense to introduce an abstract base class which both implementation inherit from? Except for the delete feature which is absent in the open addressing implementation the APIs seem pretty much similar.

Oct 06 '21 17:10 awvwgk

@FObermaier the flash you refer to is based on the gcc hash table and, as a result, is probably covered by the GNU LGPL license, which I believe is not compatible with the MIT license used by the Fortran Standard Library.

Oct 06 '21 23:10 wclodius2

@awvwgk I have tried to keep the APIs for the hash tables very similar. The init procedures differ, but that is because I provide additional features through optional arguments that most users would not use. I should probably reduce it to four arguments: map specifying the hash table to be initialized, hasher specifying the hash function to be used by the table, and the optional slots_bits specifying the initial size of the table, and status for reporting errors with the initialization. This drops the max_bits and load_factor arguments. If load_factor is a constant then I probably don't need the load_factor inquiry function for the open_hash_map_type. I also have figured out how to implement the remove_entry procedure for the open_hash_map_type and plan to implement it. With those changes the APIs would be identical, and giving them a common base class would be straight forward, but tedious.

Oct 07 '21 02:10 wclodius2

the fhash you refer to is based on the gcc hash table and, as a result, is probably covered by the GNU LGPL license, which I believe is not compatible with the MIT license used by the Fortran Standard Library.

No, it is not based on it, it is an implemention of the GCC hashmap structure in Fortran. Not a port of the GCC code. I think that is fine and it is MIT licensed, too.

Oct 07 '21 06:10 FObermaier

Any news on this?

Jun 15 '23 13:06 LecrisUT

@LecrisUT Hash maps have been implemented in stdlib. See here for more detail. Is it what you are looking for?

Jun 15 '23 15:06 jvdp1

The documentation is immense and hard to navigate. Is there a rosetta stone of how to do the python equivalent of:

dict = {}
dict.__contains__("key")
dict["key"] = 2
dict.pop("key")

Jun 15 '23 15:06 LecrisUT

@LecrisUT I agree with you that the specs are quite difficult to follow and is not written for users. Unfortunately, I never found the time to write a tutorial. Anyway, I just wrote a small example for you (see below). You can easily compile it with fpm (I used fpm v0.9.0 with the dependency stdlib="*"). Is such an example useful for you?

program main
 use stdlib_kinds, only: int8
 use stdlib_hashmap_wrappers, only: fnv_1_hasher, key_type, other_type, set, get
 use stdlib_hashmaps, only: chaining_hashmap_type, default_bits
 implicit none
 integer, parameter :: nentries = 50

 type dummy_type
  integer :: i
  real :: myval(4)
 end type dummy_type

 integer :: i
 logical :: conflict, exists
 type(dummy_type) :: dummy

 type(key_type) :: key
 type(other_type) :: other
 type(chaining_hashmap_type) :: map
 class(*), allocatable :: data

 call map%init(fnv_1_hasher)

 !Storing data
 do i = 1, nentries
  dummy%i = i
  call random_number(dummy%myval)
  call set(key, [transfer("dummy", 1_int8, len("dummy")), transfer(i, 1_int8)])
  call set(other, dummy)
  call map%map_entry(key, other, conflict)
  if(conflict) error stop "Unable to map entry because of a key conflict"
 enddo

 write(*, '(a)')'Querying table info...'
 write(*, '(a,t40,i0)')'Number of buckets allocated: ',map%num_slots()
 write(*, '(a,t40,i0)')'Number of key-value pairs stored: ',map%entries()
 write(*, '(a,t40,i0)')'The worst case bucket depth is ',map%total_depth()


 !Retrieving data
 do i = 1, nentries
  call set(key, [transfer("dummy", 1_int8, len("dummy")), transfer(i, 1_int8)])
  call map%get_other_data(key, other, exists)
  if(.not.exists)write(*,'(a)')'Warning: missing key other'

  call get(other, data)

  select type (data)
  type is (dummy_type)
   print *, 'Other data % i = ', data%i
   print *, 'Other data % myval = ', data%myval
  class default
   print *, 'Invalid data type in other'
  end select
 enddo

end program main

Jun 15 '23 17:06 jvdp1

So to get this straight:

other_type = value_type
you have to manually create values for the objects key and other using the set() function so that they can be used by the dictionary
- what's the [transfer ...] doing?
- why not at have set as a method of key_type/other_type?
map_entry == operator[]
and then just reverse the operation for get

That is quite a lot of boilerplate, but at least it manages to get me started. Will there be a simpler interface some day?

Jun 15 '23 18:06 LecrisUT

I think you got the idea.

So to get this straight:

other_type = value_type

you have to manually create values for the objects key and other using the set() function so that they can be used by the dictionary

what's the [transfer ...] doing?

set only supports int8 arrays or character. Therefore, transfer is used to copy the bitwise representation of the original key to the int8 array. The user can write her/his own code for her/his specific variable.

Note that in my example, my key is composed of both character and integer.

why not at have set as a method of key_type/other_type?

Actually the set is not required. For example, the following lines could be replaced by:

 call set(key, [transfer("dummy", 1_int8, len("dummy")), transfer(i, 1_int8)])
 call set(other, dummy)
 call map%map_entry(key, other, conflict)

by (with key aliasing key_type and other aliasing other_type):

  call map%map_entry(key([transfer("dummy", 1_int8, len("dummy")), transfer(i, 1_int8)]), other(dummy), conflict)

map_entry == operator[]

and then just reverse the operation for get

That is quite a lot of boilerplate, but at least it manages to get me started. Will there be a simpler interface some day?

I guess it is. In my opinion, it gives a lot of (too much) flexibility to the users. I don't think there are plans to simplify the interfaces. However, wrappers could be added, and we are open to feedbacks from users to improve it.

Jun 15 '23 18:06 jvdp1

Well, one thing I couldn't find is how to store and retrieve pointers instead of values

Jun 15 '23 20:06 LecrisUT

Well, one thing I couldn't find is how to store and retrieve pointers instead of values

Do you mean something like that:

 input = 'aaaarrrr'

 output => input

 call map%map_entry(key_type([transfer("pointer", 1_int8, len("pointer"))]), other_type(output), conflict)

 call map%get_other_data(key_type([transfer("pointer", 1_int8, len("pointer"))]), other, exists)

 call get(other, data)

 select type (data)
 type is (character(*))
  print *, 'character = ', data
 class default
  print *, 'Invalid data type in other'
 end select

Jun 15 '23 21:06 jvdp1

I think it is when using non-intrisic data types so that it can only use class(*), pointer which cannot upcast to class(my_type), pointer and then it complains that it can't find an implementation for get(). Probably I can just overload get() to dynamically cast the pointer.

Fortran language really needs templating to cut down all of the boilerplate.

Jun 15 '23 22:06 LecrisUT

I think it is when using non-intrisic data types so that it can only use class(*), pointer which cannot upcast to class(my_type), pointer and then it complains that it can't find an implementation for get(). Probably I can just overload get() to dynamically cast the pointer.

I think overloading it is indeed the right way.

Fortran language really needs templating to cut down all of the boilerplate.

I agree with you.

Jun 16 '23 08:06 jvdp1

Took me a bit of digging, but I found an issue with the previous example:

! This one only sets transfers "d"
call set(key, [transfer("dummy", 1_int8)])
! Should be
call set(key, [transfer("dummy", 1_int8, len("dummy")])

Jun 29 '23 15:06 LecrisUT

@LecrisUT Good catch! Thank you. I'll edit my posts. Sorry for the troubles!

Jun 30 '23 16:06 jvdp1

stdlib stdlib copied to clipboard

Abstract base class for map types

stdlib
stdlib copied to clipboard