simdjson icon indicating copy to clipboard operation
simdjson copied to clipboard

Adding customization for standard types

Open the-moisrex opened this issue 1 year ago • 13 comments

With tag_invoke #2228, we can provide customizations for standard types.

I'm thinking of:

  • [ ] vector
  • [ ] map
  • [ ] deque
  • [ ] list
  • [ ] forward_list
  • [ ] set
  • [ ] multiset
  • [ ] multimap
  • [ ] unordered_set
  • [ ] unordered_map
  • [ ] unordered_multiset
  • [ ] unordered_multimap
  • [ ] stack
  • [ ] queue
  • [ ] priority_queue
  • [ ] flat_set
  • [ ] flat_map
  • [ ] flat_multiset
  • [ ] flat_multimap
  • [ ] span
  • [ ] unique_ptr
  • [ ] smart_ptr
  • [ ] atomic<T>
  • [ ] optional<T>
  • [ ] expected<T, E>
  • [ ] filesystem::path
  • [ ] chrono::duration
  • [ ] pair
  • [ ] tuple
  • [ ] ...

Honestly, it's the opposite of std::formatter<...> customizations.

Felt like half a day of work when I started typing, but now it sounds like a whole week of work :)

the-moisrex avatar Aug 07 '24 22:08 the-moisrex

@lemire And also, if we provide customizations for these types, these types need to be fully provided not half-in half-out like map<std::string, T>.

Maybe the user would want to provide a customization for username class, and then wants to have this: map<username, int> for example.

The point of this is to make nested ideas work, because now we can.

Imagine map<username, vector<username>> for a list of friends for example.

the-moisrex avatar Aug 08 '24 01:08 the-moisrex

@the-moisrex And we need to write tests and generate some benchmarks. We want solid support.

C++26 introduces reflection (ping @FranciscoThiesen) and so we can automatically support custom classes... E.g.,

struct myclass {
float x;
float y;
float z
}

... can be supported without any effort on the part of the programmer. Of course, C++26 compilers are not yet around, but soon enough...

lemire avatar Aug 08 '24 03:08 lemire

@lemire I'm making a std directory to put each of mentioned above's deserialize tag_invoke function in its own file in order to make sure people don't have to include the whole standard library and opt-into each one that they need.

Question: Which exception-safety you'd prefer?

Let me explain, Here's a deserialize CPO implementation for vector<T, AllocT>:

template <typename T, typename AllocT, typename ValT>
error_code tag_invoke(deserialize_tag, ValT &val, std::vector<T, AllocT>& out)
    noexcept(!SIMDJSON_EXCEPTIONS && std::is_nothrow_default_constructible_v<T> && std::is_nothrow_copy_assignable_v<T>)
{
  using SIMDJSON_IMPLEMENTATION::ondemand::array;

  // For better error messages, don't use these as constraints on the tag_invoke CPO.
  static_assert(deserializable<T, ValT>, "The specified type inside the vector must itself be deserializable");
  static_assert(std::is_default_constructible_v<T>, "The specified type inside the vector must default constructible.");

  array arr;
  SIMDJSON_TRY(val.get_array().get(arr));
  for (auto v : arr) {
    T value;
    SIMDJSON_TRY(v.get<T>().get(value));
    SIMDJSON_CATCH(out.push_back(value), MEMALLOC);
  }
  return SUCCESS;
}

SIMDJSON_CATCH and SIMDJSON_IMMEDIATE_CATCH is defined as (in common_defs.h):

/// Catch exceptions immediately and convert them into returned error codes.
#define SIMDJSON_IMMEDIATE_CATCH(EXPR, ERR) try { (EXPR); } catch (...) { return ERR; }

#if SIMDJSON_EXCEPTIONS
#define SIMDJSON_CATCH(EXPR, ERR) (EXPR)
#else
#define SIMDJSON_CATCH(EXPR, ERR) SIMDJSON_IMMEDIATE_CATCH(EXPR, ERR)
#endif

Of course if T in vector<T> itself throws, then we should throw (It's not a good idea to force that even though we can).

So, my question is how much it should we throw? When users have exceptions disabled, we can return MEMALLOC error code which I've shown in the code above, but should we also do that in the normal way where exceptions are allowed as well since we already have a MEMALLOC or just let it throw?

P.S. the deserializable concept is updated and takes account the builtin types as well.

the-moisrex avatar Sep 24 '24 20:09 the-moisrex

When users have exceptions disabled, we can return MEMALLOC error code which I've shown in the code above,

I am pretty sure that SIMDJSON_IMMEDIATE_CATCH is not valid when exceptions are disabled.

When exceptions are disabled, you are not allowed to catch exceptions (as exceptions are not generated in the first place).

In a system like Node.js, if you run out of memory, it will just abort. Period. (Mind you, that's what most C++ software will do since hardly anyone catches exceptions systematically when accessing strings or STL containers.)

So I would just write...

  array arr;
  SIMDJSON_TRY(val.get_array().get(arr));
  for (auto v : arr) {
    T value;
    SIMDJSON_TRY(v.get<T>().get(value));
    out.push_back(value);
  }

lemire avatar Sep 24 '24 20:09 lemire

@lemire Hmm. So, should I catch it directly with SIMDJSON_IMMEDIATE_CATCH when exceptions are enabled or just completely go with noexcept(false) and make it easy for ourselves?

Also, we probably should std::move the value as well.

the-moisrex avatar Sep 24 '24 21:09 the-moisrex

So, should I catch it directly with SIMDJSON_IMMEDIATE_CATCH when exceptions are enabled or just completely go with noexcept(false) and make it easy for ourselves?

Here is my expectation:

  1. If exceptions are enabled, then I would expect that if the container throws an exception, then me (as the user) will receive an exception when tag_invoke is called. I would not expect the library (simdjson) to catch the exception for me.
  2. If exceptions are disabled, and the container would throw, then this should lead to an abort.

Does that sound reasonable?

lemire avatar Sep 24 '24 21:09 lemire

Great. I guess the existence of MEMALLOC had me confused.

the-moisrex avatar Sep 24 '24 21:09 the-moisrex

@the-moisrex

Internally, we use new(std::nothrow) to allocate memory. We then check for a null pointer, and if a null pointer is found, we return MEMALLOC.

https://en.cppreference.com/w/cpp/memory/new/nothrow

As you know, STL containers do not work this way (they don't have an 'exception-free mode').

lemire avatar Sep 24 '24 21:09 lemire

@lemire I saw that usage, the confusing part is that all of these are noexcept(false) anyway, so what's the point of having a MEMALLOC if the user eventually have to catch exceptions from the library anyway?

image

And now that I checked, the weird thing is .get_object() and the rest of them themselves are noexcept. Is this a mistakes?

the-moisrex avatar Sep 24 '24 21:09 the-moisrex

@the-moisrex I have invited you to the simdjson org. Please consider accepting the invite.

lemire avatar Sep 24 '24 22:09 lemire

I saw that usage, the confusing part is that all of these are noexcept(false) anyway, so what's the point of having a MEMALLOC if the user eventually have to catch exceptions from the library anyway?

  1. Users who have enabled exceptions get an extended API with some throwing functions.
  2. Users who have disabled exceptions (e.g., Node.js) get a limited API with only non-throwing functions.

lemire avatar Sep 24 '24 22:09 lemire

@lemire thanks for the invite, but I plan to be a seasoned contributor to SIMDJSON; I have my hands full with Web++ which is getting ridiculously big and I hate to leave things half-cooked, that's why I'm trying to finish up tag_invoke and extract and get back to Web++ which I've been working on since 2019 at least.

  • I saw that usage, the confusing part is that all of these are noexcept(false) anyway, so what's the point of having a MEMALLOC if the user eventually have to catch exceptions from the library anyway?
  • Users who have enabled exceptions get an extended API with some throwing functions. Users who have disabled exceptions (e.g., Node.js) get a limited API with only non-throwing functions.

I understand that, but I don't understand why these are noexcept(false) even though their implementations are noexcept (though I think one of them is a mistake):

https://github.com/simdjson/simdjson/blob/49b9860899bda9c186e5ee811378d765c9f70ce3/include/simdjson/generic/ondemand/value-inl.h#L105-L134

the-moisrex avatar Sep 24 '24 22:09 the-moisrex

thanks for the invite, but I plan to be a seasoned contributor to SIMDJSON; I have my hands full with Web++ which is getting ridiculously big and I hate to leave things half-cooked, that's why I'm trying to finish up tag_invoke and extract and get back to Web++ which I've been working on since 2019 at least.

That's fine, but I'd still like you to consider the invite.

I understand that, but I don't understand why these are noexcept(false) even though their implementations are noexcept (though I think one of them is a mistake):

These functions will throw. At least, array(), object(), uint64_t() will throw.

Casting an ondemand::value in simdjson is a throwing operation. We have no other way to manage errors.

lemire avatar Sep 24 '24 22:09 lemire