stdlib
stdlib copied to clipboard
Abstract base class for map types
Description This issue should track a potential addition of a map class to stdlib. I think it would be preferable to first define an abstract base class which covers most of our needs. Here are some points I found important when working on TOML Fortran:
- data structure owns data
- memory safety
- easy deep copies
- no cyclic references
- get / find an element based on a key
- returns pointer to value inside structure
- add / push back an element for a key
- value must be allocatable
- delete / drop an element at a given key
- should value (optionally) be returned on delete / drop?
- get a list of all keys / provide an iterator over all keys
- keys can be strings
- what about integer, real, ... values?
Examples
Abstract base class used in TOML Fortran to define maps
type, abstract :: map_class
contains
!> Find a value based on its key
procedure(find), deferred :: find
!> Push back a value to the structure
procedure(push_back), deferred :: push_back
!> Get list of all keys in the structure
procedure(get_keys), deferred :: get_keys
!> Delete a value at a given key
procedure(delete), deferred :: delete
!> Destroy the data structure
procedure(destroy), deferred :: destroy
end type
abstract interface
!> Find a value based on its key
subroutine find(self, key, ptr)
import :: map_class, value_type
!> Instance of the structure
class(map_class), intent(inout), target :: self
!> Key to the value
character(len=*), intent(in) :: key
!> Pointer to the stored value at given key
type(value_type), pointer, intent(out) :: ptr
end subroutine find
!> Push back a value to the structure
subroutine push_back(self, val)
import :: map_class, value_type
!> Instance of the structure
class(map_class), intent(inout), target :: self
!> Value to be stored
type(value_type), allocatable, intent(inout) :: val
end subroutine push_back
!> Get list of all keys in the structure
subroutine get_keys(self, list)
import :: map_class, string_type
!> Instance of the structure
class(map_class), intent(inout), target :: self
!> List of all keys
type(string_type), allocatable, intent(out) :: list(:)
end subroutine get_keys
!> Delete a value at a given key
subroutine delete(self, key)
import :: map_class, value_type
!> Instance of the structure
class(map_class), intent(inout), target :: self
!> Key to the value
character(len=*), intent(in) :: key
end subroutine delete
!> Deconstructor for data structure
subroutine destroy(self)
import :: map_class
!> Instance of the structure
class(map_class), intent(inout), target :: self
end subroutine destroy
end interface
Is map the same as dictionary (e.g. Python's dict
) or do they differ in some ways?
In Python I work with this a lot. In Fortran, not really. But this is one of the features that I think, even if not presently used a lot, would open up many possibilities for development of high level libraries. For example, databases (SQL-style or otherwise), various file formats, parsing HTML, SVG, and other kinds of XML, HTTP clients and servers.
Fortran-native approach is the way to go.
An alternative, perhaps for its own package rather than stdlib, would be Fortran bindings for Redis, which provides high performance in-memory dicts and lists.
I have worked on a Fortran interface to Redis, but I ran into odd problems that I simply did not understand. I would have to look up my notes, but if there is interest I can revive it ;).
Op wo 25 aug. 2021 om 17:00 schreef Milan Curcic @.***>:
Is map the same as dictionary (e.g. Python's dict) or do they differ in some ways?
In Python I work with this a lot. In Fortran, not really. But this is one of the features that I think, even if not presently used a lot, would open up many possibilities for development of high level libraries. For example, databases (SQL-style or otherwise), various file formats, parsing HTML, SVG, and other kinds of XML, HTTP clients and servers.
Fortran-native approach is the way to go.
An alternative, perhaps for its own package rather than stdlib, would be Fortran bindings for Redis https://redis.io/, which provides high performance in-memory dicts and lists.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/479#issuecomment-905582297, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR7BPOKEAPHPCNWKYEDT6UAQFANCNFSM5BIOAAUQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
I was thinking C++ ordered map, but Python dictionary is the same thing.
Having an abstract base class would allow to implement the map type with any backend, stdlib would only provide the base class and pure Fortran implementations. Other (fpm) projects could extend the base class and provide their own optimized implementation.
The advantage of a single well-designed abstract base class would be that projects like TOML Fortran don't have to implement their own, but you could simply load a TOML document into a stdlib map type and use it with any other library which supports stdlib's maps. Designed correctly a library could also load data directly into an optimized map implementation.
For the record, are you aware of fhash?
FWIW I have just released a proposed hash map API on the website with my proposed hash function API. The API defines three modules: stdlib_32_bit_key_data_wrapper
, stdlib_chaining_hash_map
and stdlib_open_hash_map
.
The stdlib_32_bit_key_data_wrapper
module does the following: defines an abstract interface for hash functions, provides a wrapper for some hash functions so they match that interface; defines a key datatype, key_type
whose contents are currently not private, but can be made abstract with setters and getters that take INT8
vectors and character strings as arguments; defines another datatype, other_type
, for data that supplements the key, whose contents are currently not private, but can be made abstract with setters and getters that take INT8
vectors and character strings as arguments.
The module stdlib_chaining_hash_map
implements a datatype, chaining_hash_map_type
, for a simple separate chaining hash map. The API provides most of the functionality @awvwgk requests except it doesn't provide a procedure for returning a list of all keys, but that procedure is easy to implement.
The module stdlib_open_hash_map
implements a datatype, open_hash_map_type
, for a simple open addressing hash map with linear insertion. The API provides most of the functionality @awvwgk requests except it doesn't provide for returning a list of all keys, but that is easy to implement, and it doesn't provide a procedure for deleting elements. It is possible to provide an element deletion procedure for an open addressing hash map, but it is tricky to implement and the runtime cost is significantly higher than for a chaining hash map.
The API's for the two data types are very similar. The init_map
procedure for open_hash_map_type
has an additional load_factor
argument missing from the chaining_hash_map_type
. The open_hash_map_type
also has an additional load_factor
inquiry function. The chaining_hash_map_type
also has the remove_entry
subroutine that open_hash_map_type
lacks.
The two APIs also have some inquiry functions on the structure and history of the hash maps that could be eliminated. In particular I think the total_depth
function is less useful to casual users.
FWIW I have just changed the implementation of key_type
and other_type
to be opaque with getters and setters and have updated hash_maps.md
accordingly.
For the record, are you aware of fhash?
There's also my own implementation of the same name: LKedward/fhash. (My focus for that project was mostly on providing a nice simple API without preprocessing - the underlying implementation is nothing special.)
Would it make sense to introduce an abstract base class which both implementation inherit from? Except for the delete feature which is absent in the open addressing implementation the APIs seem pretty much similar.
@FObermaier the flash you refer to is based on the gcc hash table and, as a result, is probably covered by the GNU LGPL license, which I believe is not compatible with the MIT license used by the Fortran Standard Library.
@awvwgk I have tried to keep the APIs for the hash tables very similar. The init procedures differ, but that is because I provide additional features through optional arguments that most users would not use. I should probably reduce it to four arguments: map
specifying the hash table to be initialized, hasher
specifying the hash function to be used by the table, and the optional slots_bits
specifying the initial size of the table, and status
for reporting errors with the initialization. This drops the max_bits
and load_factor
arguments. If load_factor
is a constant then I probably don't need the load_factor
inquiry function for the open_hash_map_type
. I also have figured out how to implement the remove_entry
procedure for the open_hash_map_type
and plan to implement it. With those changes the APIs would be identical, and giving them a common base class would be straight forward, but tedious.
the fhash you refer to is based on the gcc hash table and, as a result, is probably covered by the GNU LGPL license, which I believe is not compatible with the MIT license used by the Fortran Standard Library.
No, it is not based on it, it is an implemention of the GCC hashmap structure in Fortran. Not a port of the GCC code. I think that is fine and it is MIT licensed, too.
Any news on this?
@LecrisUT Hash maps have been implemented in stdlib
. See here for more detail. Is it what you are looking for?
The documentation is immense and hard to navigate. Is there a rosetta stone of how to do the python equivalent of:
dict = {}
dict.__contains__("key")
dict["key"] = 2
dict.pop("key")
@LecrisUT I agree with you that the specs are quite difficult to follow and is not written for users. Unfortunately, I never found the time to write a tutorial.
Anyway, I just wrote a small example for you (see below). You can easily compile it with fpm
(I used fpm v0.9.0
with the dependency stdlib="*"
). Is such an example useful for you?
program main
use stdlib_kinds, only: int8
use stdlib_hashmap_wrappers, only: fnv_1_hasher, key_type, other_type, set, get
use stdlib_hashmaps, only: chaining_hashmap_type, default_bits
implicit none
integer, parameter :: nentries = 50
type dummy_type
integer :: i
real :: myval(4)
end type dummy_type
integer :: i
logical :: conflict, exists
type(dummy_type) :: dummy
type(key_type) :: key
type(other_type) :: other
type(chaining_hashmap_type) :: map
class(*), allocatable :: data
call map%init(fnv_1_hasher)
!Storing data
do i = 1, nentries
dummy%i = i
call random_number(dummy%myval)
call set(key, [transfer("dummy", 1_int8, len("dummy")), transfer(i, 1_int8)])
call set(other, dummy)
call map%map_entry(key, other, conflict)
if(conflict) error stop "Unable to map entry because of a key conflict"
enddo
write(*, '(a)')'Querying table info...'
write(*, '(a,t40,i0)')'Number of buckets allocated: ',map%num_slots()
write(*, '(a,t40,i0)')'Number of key-value pairs stored: ',map%entries()
write(*, '(a,t40,i0)')'The worst case bucket depth is ',map%total_depth()
!Retrieving data
do i = 1, nentries
call set(key, [transfer("dummy", 1_int8, len("dummy")), transfer(i, 1_int8)])
call map%get_other_data(key, other, exists)
if(.not.exists)write(*,'(a)')'Warning: missing key other'
call get(other, data)
select type (data)
type is (dummy_type)
print *, 'Other data % i = ', data%i
print *, 'Other data % myval = ', data%myval
class default
print *, 'Invalid data type in other'
end select
enddo
end program main
So to get this straight:
-
other_type = value_type
- you have to manually create values for the objects
key
andother
using theset()
function so that they can be used by the dictionary- what's the
[transfer ...]
doing? - why not at have
set
as a method ofkey_type
/other_type
?
- what's the
-
map_entry == operator[]
- and then just reverse the operation for
get
That is quite a lot of boilerplate, but at least it manages to get me started. Will there be a simpler interface some day?
I think you got the idea.
So to get this straight:
other_type = value_type
- you have to manually create values for the objects
key
andother
using theset()
function so that they can be used by the dictionary
- what's the
[transfer ...]
doing?
set
only supports int8
arrays or character
. Therefore, transfer
is used to copy the bitwise representation of the original key to the int8
array.
The user can write her/his own code for her/his specific variable.
Note that in my example, my key is composed of both character
and integer
.
- why not at have
set
as a method ofkey_type
/other_type
?
Actually the set
is not required. For example, the following lines could be replaced by:
call set(key, [transfer("dummy", 1_int8, len("dummy")), transfer(i, 1_int8)])
call set(other, dummy)
call map%map_entry(key, other, conflict)
by (with key aliasing key_type and other aliasing other_type):
call map%map_entry(key([transfer("dummy", 1_int8, len("dummy")), transfer(i, 1_int8)]), other(dummy), conflict)
map_entry == operator[]
- and then just reverse the operation for
get
That is quite a lot of boilerplate, but at least it manages to get me started. Will there be a simpler interface some day?
I guess it is. In my opinion, it gives a lot of (too much) flexibility to the users. I don't think there are plans to simplify the interfaces. However, wrappers could be added, and we are open to feedbacks from users to improve it.
Well, one thing I couldn't find is how to store and retrieve pointers instead of values
Well, one thing I couldn't find is how to store and retrieve pointers instead of values
Do you mean something like that:
input = 'aaaarrrr'
output => input
call map%map_entry(key_type([transfer("pointer", 1_int8, len("pointer"))]), other_type(output), conflict)
call map%get_other_data(key_type([transfer("pointer", 1_int8, len("pointer"))]), other, exists)
call get(other, data)
select type (data)
type is (character(*))
print *, 'character = ', data
class default
print *, 'Invalid data type in other'
end select
I think it is when using non-intrisic data types so that it can only use class(*), pointer
which cannot upcast to class(my_type), pointer
and then it complains that it can't find an implementation for get()
. Probably I can just overload get()
to dynamically cast the pointer.
Fortran language really needs templating to cut down all of the boilerplate.
I think it is when using non-intrisic data types so that it can only use
class(*), pointer
which cannot upcast toclass(my_type), pointer
and then it complains that it can't find an implementation forget()
. Probably I can just overloadget()
to dynamically cast the pointer.
I think overloading it is indeed the right way.
Fortran language really needs templating to cut down all of the boilerplate.
I agree with you.
Took me a bit of digging, but I found an issue with the previous example:
! This one only sets transfers "d"
call set(key, [transfer("dummy", 1_int8)])
! Should be
call set(key, [transfer("dummy", 1_int8, len("dummy")])
@LecrisUT Good catch! Thank you. I'll edit my posts. Sorry for the troubles!