Daniel Lemire
Daniel Lemire
https://github.com/uniVocity/csv-parsers-comparison
This should be a bit better. My own scores are about +5%.
Terminology: Selecting a subset of columns is called a projection. Since extracting the indexes is expensive, you may want to only pick some of the indexes, never committing to memory...
Caveat
The caveat with Intel random number generation instructions is that you need to trust Intel's implementation. That is, you must have faith the Intel did not collaborate with the NSA...
There is now a C/C++ implementation of Roaring that might be of interest... https://github.com/RoaringBitmap/CRoaring
https://github.com/FastFilter/fastfilter_cpp
We support JSON Pointer, but we should support JSON Path. It is more work, but also more useful. Currently, a limited subset of JSONPath is supported, see https://github.com/simdjson/simdjson/pull/2127
For better documentation of our interface, we should adopt concepts when C++20 is available. https://lemire.me/blog/2023/04/18/defining-interfaces-in-c-with-concepts-c20/
Currently, our documentation does not address raw_json(). Furthermore, it looks to be a consuming operation, something we ought to rethink possibly.
The simdjson library has support for JSON Pointers. [JSON Path](https://goessner.net/articles/JsonPath/) is a much more powerful query language. It seems that it could be efficiently implemented with On Demand. cc @jkeiser