core icon indicating copy to clipboard operation
core copied to clipboard

add `get_by_view` method to `Map`

Open Guest0x0 opened this issue 7 months ago • 5 comments

This PR adds a new method get_by_view to Map. The motivation is the need for querying a Map[String, V] with @string.View. Converting the view to String first requires a copy, storing @string.View directly in the map may result in memory leak. So a new API is necessary for this case.

The signature of the new API is as follows:

/// Get the value associated with the view of a key.
/// When searching the map, instead of comparing keys directly,
/// a view of keys, obtained by the parameter `view`, is used.
/// This is useful for e.g. querying a `Map` of `String` with `@string.View`.
///
/// The `view` function and the `Hash` implementation of `View` should satisfy:
/// ```
/// view(key).hash() == key.hash() 
/// ```
fn get_by_view[K, View : Hash + Eq, Value](
  self : Self[K, Value],
  key : View,
  view~ : (K) -> View
) -> Value?

The map will be searched with the hash code of key. When comparing equality, existing keys in the map will be first converted to View using view.

Note that for the @string.View case, a specialized method may be adequate. But there are other cases such as @bytes.View. So a generic implementation is used here. The view function will only be called when comparing keys with identical hash code, so it will not be invoked a lot. Therefore even if get_by_view is not inlined, the indirect call cost of view should be acceptable (since using get_by_view implies that the key is probably a large type with non-trivial comparison, such as String)

Guest0x0 avatar May 12 '25 03:05 Guest0x0

Missing hash code collision handling in get_by_view

Category Correctness Code Snippet if entry.hash == hash && view(entry.key) == key { Recommendation Add explicit checks for keys with same hash but different values, similar to the regular get method Reasoning The current implementation might return incorrect values when multiple keys have the same hash code but are different. It relies solely on the hash equality and view comparison.

Duplicated probing logic between get and get_by_view

Category Maintainability Code Snippet for i = 0, idx = hash & self.capacity_mask { ... } Recommendation Extract the common probing logic into a helper function that takes a comparison predicate as parameter Reasoning The probing logic is duplicated between get and get_by_view methods. Extracting it would reduce code duplication and make future modifications easier to maintain.

Test coverage could be improved for hash collisions

Category Correctness Code Snippet test "map::get_by_view::string_view" { ... } Recommendation Add test cases that verify behavior with hash collisions and different key types beyond just String Reasoning Current tests only verify basic functionality with String keys. Adding tests for hash collisions and other key types would ensure the implementation works correctly in all cases.

Pull Request Test Coverage Report for Build 6660

Details

  • 4 of 5 (80.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.009%) to 92.442%

Changes Missing Coverage Covered Lines Changed/Added Lines %
builtin/linked_hash_map.mbt 4 5 80.0%
<!-- Total: 4 5
Totals Coverage Status
Change from base Build 6659: -0.009%
Covered Lines: 6140
Relevant Lines: 6642

💛 - Coveralls

coveralls avatar May 12 '25 03:05 coveralls

This is one of the case which show generic params of traits is useful.

hackwaly avatar May 12 '25 08:05 hackwaly

this may be somthing easier to understand :

get_by_hashcode[K, View: Eq, V](self : Self[K,V], hashcode : Int , v : View, f~ : (K) -> View) -> Value? 

Then we can build something safe on top of it, is it a premature optimization?

bobzhang avatar May 13 '25 01:05 bobzhang

this may be somthing easier to understand :

get_by_hashcode[K, View: Eq, V](self : Self[K,V], hashcode : Int , v : View, f~ : (K) -> View) -> Value? 

Then we can build something safe on top of it, is it a premature optimization?

Is this a user-facing API, or just an internal helper? In terms of implementation, get_by_hashcode seems to be just get_by_view here with the hash code computation lifted out, is there any use of get_by_hashcode except for get_by_view?

Guest0x0 avatar May 13 '25 03:05 Guest0x0

How about:

pub fn[K, View : Hash, Value] Map::get_by(
  self : Map[K, Value],
  key : View,
  equal : (K, View) -> Bool
) -> Value? {
  ...
}

tonyfettes avatar May 28 '25 05:05 tonyfettes

/// The `view` function and the `Hash` implementation of `View` should satisfy:
/// ```
/// view(key).hash() == key.hash() 
/// ```
fn get_by_view[K, View : Hash + Eq, Value](
  self : Self[K, Value],
  key : View,
  view~ : (K) -> View
) -> Value?

It is hard to reason about if the variant is met or not. for String, we added an optimization that "abc"[:].to_string() does not allocate, maybe we can add a special function:

Map[String]::get_from_string(self : Self, StringView)
Map[Bytes]::get_from_bytes(self : Self, BytesView)

bobzhang avatar Oct 29 '25 03:10 bobzhang