simdjson_php icon indicating copy to clipboard operation
simdjson_php copied to clipboard

`parser.allocate` will reallocate buffers - call allocate only to change depth

Open TysonAndre opened this issue 2 years ago • 0 comments

https://github.com/simdjson/simdjson/blob/master/doc/dom.md#reusing-the-parser-for-maximum-efficiency

If you're using simdjson to parse multiple documents, or in a loop, you should make a parser once and reuse it. The simdjson library will allocate and retain internal buffers between parses, keeping buffers hot in cache and keeping memory allocation and initialization to a minimum. In this manner, you can parse terabytes of JSON data without doing any new allocation.

class simdjson::dom::parser only provides set_max_depth(), allocate(), but not set_capacity(). So to set just the max depth, only call allocate() if the depth actually changed, which should be infrequent

  • parser::parse_into_document calls ensure_capacity already, and ensure_capacity calls allocate if needed

Related to #73

Note that simdjson will not need capacities beyond the range of a uint32, and will reject requests for larger capacities

/** The maximum document size supported by simdjson. */
constexpr size_t SIMDJSON_MAXSIZE_BYTES = 0xFFFFFFFF;
simdjson_warn_unused simdjson_inline error_code parser::allocate(size_t new_capacity, size_t new_max_depth) noexcept {
  if (new_capacity > max_capacity()) { return CAPACITY; }
  if (string_buf && new_capacity == capacity() && new_max_depth == max_depth()) { return SUCCESS; }

  // string_capacity copied from document::allocate
  _capacity = 0;
  size_t string_capacity = SIMDJSON_ROUNDUP_N(5 * new_capacity / 3 + SIMDJSON_PADDING, 64);
  string_buf.reset(new (std::nothrow) uint8_t[string_capacity]);
#if SIMDJSON_DEVELOPMENT_CHECKS
  start_positions.reset(new (std::nothrow) token_position[new_max_depth]);
#endif
  if (implementation) {
    SIMDJSON_TRY( implementation->set_capacity(new_capacity) );
    SIMDJSON_TRY( implementation->set_max_depth(new_max_depth) );
  } else {
    SIMDJSON_TRY( simdjson::get_active_implementation()->create_dom_parser_implementation(new_capacity, new_max_depth, implementation) );
  }
  _capacity = new_capacity;
  _max_depth = new_max_depth;
  return SUCCESS;
}

TysonAndre avatar Oct 02 '22 01:10 TysonAndre