lucene-solr
lucene-solr copied to clipboard
SOLR-13260: Up to 128 bit integer point type - ByteString
An initial implementation of 128 bit integer point filed type. Would benefit from somebody with a bit more experience with the code base having a look at it to make sure everything is ok. I've called the file type "longlong", as the C standard definition of longlong is "at least 64 bits".
Some issues to consider: Input/output: Text based formats (CSV,JSON,XML) are supported Sorting - missing value support not implemented Faceting and Functions not implemented - is facet functionality even valid without numeric fields? Transaction log support not implemented - JavaBinCodec currently not supported - Is this required for replication to work?
There seems to be a lot of noise in the PR from a pre-commit merge from master. Fixing and re-pushing
I would name it int128 instead of longlong. That way we can have int256 or int512 in the future. Names like int32 or int64 are C++ standard (https://en.cppreference.com/w/cpp/types/integer).
Hi, thanks for the feedback. I picked longlong from https://en.cppreference.com/w/cpp/language/types. It is also consistent with existing Solr types of int and long. I believe solr and lucene is unlikely to support anything longer than 128 bits - The underlying implementation types only support a maximum of 128 bits. If people consider it is important issue I'm happy to change things, but I would like some consensus from the community before I do so.
I believe solr and lucene is unlikely to support anything longer than 128 bits - The underlying implementation types only support a maximum of 128 bits.
The underlying implementation is not set in stone, so at some point there could be 256 bit support, or maybe more likely efficient fixed-bits at arbitrary length. Due to that, I am partial to int128
.
My question is what we gain from a longlong
/int128
-type? It is very different from atomic numeric types, so it seems like it will require a lot of implementation and maintenance effort to support Solr functions?
Hi Toke, thanks for your comments. Its great to have someone with a bit more experience with the SOLR code base involved. In answer to the question of why? 128 bit int are required to support a ipv6 field type (https://issues.apache.org/jira/browse/SOLR-6741). I started off adding support for generic 128 bit field types as part of that jira, but ended up splitting it off into a separate commit as the basic type was more the capable of standing on its own, and somebody else may find the functionality useful.
So if we want ipv6, we need a 128 bits. I guess it is a valid question on whether we should expose a generic 128 bit type to anything else - I'm open to suggestions about what would be best here.
In regards to supporting Solr functions - I agree - this is an area of significant concern for me as well. Most (all?) field functions only support up to 64 bit types. The tradeoff between implementation effort and end-user functionality for 128 bit point types is unlikely to be justifiable (and would likely involve performance tradeoffs).
My concerns are primarily focussed on what is needed (and possibly useful) in supporting an ipv6 data type. If I can get away with doing something a count aggregation and nothing more I would call it a workable win. Only exposing that functionality in the Ipv6 type make help to reduce end user confusion.
If we add a length parameter the type could support anywhere from 8 to 16 bytes at the moment. In that case maybe a better name would be bytearray or bytestring. The name change would also reduce any expectations of the type supporting standard number type functions, which would not be a bad thing.
Just finding this now, sorry that I'm late to the party, I really think "String" should be a word used for things that involve encodings like UTF-8 etc. Could we opt for the ByteArray name instead? The line for doTestNumberPointFunctionQuery seems to be commented out (the method is there, but the invocation is commented) When I uncomment it it fails.