lucenenet
lucenenet copied to clipboard
Alternative for SetNextReader to return all strings
Is there an existing issue for this?
- [X] I have searched the existing issues
Describe the documentation issue
PaulVrugt was asking this question, but never got a response to it:
The FieldCache GetStrings method was replace by GetTerms, but GetTerms requires an AtomicReader, we used to be able to pass an IndexReader into this method and it used to return a string array containing the values. How to I get the same kind of behavior from the GetTerms method?
Is there no way to have the same behavior that GetStrings did in version 3.0.3?
Additional context
Here is the link to that thread: https://github.com/apache/lucenenet/issues/398 No response
The Migration Guide covers this very issue with an example:
LUCENE-2380: FieldCache.GetStrings/Index --> FieldCache.GetDocTerms/Index
-
The field values returned when sorting by
SortField.STRING
are nowBytesRef
. You can callvalue.Utf8ToString()
to convert back to string, if necessary. -
In
FieldCache
,GetStrings
(returningstring[]
) has been replaced withGetTerms
(returning aBinaryDocValues
instance).BinaryDocValues
provides aGet
method, taking adocID
and aBytesRef
to fill (which must not benull
), and it fills it in with the reference to the bytes for that term.
If you had code like this before:string[] values = FieldCache.DEFAULT.GetStrings(reader, field); ... string aValue = values[docID];
you can do this instead:
BinaryDocValues values = FieldCache.DEFAULT.GetTerms(reader, field); ... BytesRef term = new BytesRef(); values.Get(docID, term); string aValue = term.Utf8ToString();
Note however that it can be costly to convert to
String
, so it's better to work directly with theBytesRef
. -
Similarly, in
FieldCache
, GetStringIndex (returning aStringIndex
instance, with direct arraysint[]
order andString[]
lookup) has been replaced withGetTermsIndex
(returning aSortedDocValues
instance).SortedDocValues
provides theGetOrd(int docID)
method to lookup the int order for a document,LookupOrd(int ord, BytesRef result)
to lookup the term from a given order, and the sugar methodGet(int docID, BytesRef result)
which internally callsGetOrd
and thenLookupOrd
.
If you had code like this before:StringIndex idx = FieldCache.DEFAULT.GetStringIndex(reader, field); ... int ord = idx.order[docID]; String aValue = idx.lookup[ord];
you can do this instead:
DocTermsIndex idx = FieldCache.DEFAULT.GetTermsIndex(reader, field); ... int ord = idx.GetOrd(docID); BytesRef term = new BytesRef(); idx.LookupOrd(ord, term); string aValue = term.Utf8ToString();
Note however that it can be costly to convert to
String
, so it's better to work directly with theBytesRef
.
DocTermsIndex
also has aGetTermsEnum()
method, which returns an iterator (TermsEnum
) over the term values in the index (ie, iterates ord = 0..NumOrd-1).
Furthermore, if you drill down into the issue LUCENE-2380, there is an explanation for the change: primarily, this was done for performance reasons. There is no longer a string[]
stored in the field cache, the underlying data is now a byte[]
so extra steps are required to get a UTF8 string.
Do note that you are meant to reuse the BytesRef
instance that is passed in to get better performance.