julea icon indicating copy to clipboard operation
julea copied to clipboard

Object listing backend interface

Open tilpner opened this issue 4 years ago • 2 comments

There is currently no way to expose the functionality of listing stored objects, either all or by range query. Any potential interface should probably allow restricting the amount of returned results, by total amount or key space, to prevent long query times for clients which only need to list a few objects.

tilpner avatar Feb 19 '21 13:02 tilpner

Object backends now support get_all, get_by_prefix and iterate functions. Let me know what you think.

michaelkuhn avatar Mar 04 '21 13:03 michaelkuhn

There's a few usecases this design doesn't support:

  • Filtering by range, instead of prefix. E.g. "give me all objects with names lexicographically between sa* to st*"

    • This would require a shared ordering definition between backends
    • Lexicographic ordering on byte strings breaks down in the presence of multi-byte characters (e.g. unicode object keys)
    • If implemented, could replace get_by_prefix
      • Backend would only offer get_by_range
      • higher-level API maps get_by_prefix(key) onto get_by_range(key, key_with_last_place_incremented)
    • This would complicate backends which can't range-query easily
    • Probably shouldn't be supported?
  • Partial iteration: cancelling an iteration after finding the object the client was searching for

    • While the client can just ignore the iterator, the backend can only free the associated resources if the iterator is exhausted
    • Solution could be another function to close an iterator
      • What if the iterators of get_all and get_by_prefix require different destruction steps?
      • One function for each, or is the backend expected to tag its iterator?
  • Fulfillment errors during iteration

    • There's no way for backend_iterate to signal an error, because FALSE is interpreted to mean "no more elements"
    • Probably fine, as there's not really much information in a bool for the caller to react to
  • Metadata reporting along with key name

    • If the backend is walking its own metadata anyway, it might be uniquely cheap to also offer access to additional metadata during iteration
    • If a client wants to filter for e.g. recent modification time, it would have to call backend_status on each of the keys
      • Saves one call per key if the metadata is needed by the client
      • "Filter for recent modification" is not a fast usecase even with mtime reporting during iteration

I'm not saying they all should be supported, just making sure they are considered.

tilpner avatar Mar 04 '21 14:03 tilpner