luke
luke copied to clipboard
Question about luke-luke-4.10.4-field-reconstruction
Hi, I am opening an issue as I can't find another way to ask Dmitry a question - sorry if this is the wrong approach.
Before I realised that you had made this patch I wrote my own code to do the same thing, which I am still experimenting with. Then I found your patch and compared the code, specifically the part which reconstructs the data when there is no position information.
I note that (IIUC) your new code takes the first term and adds it to the GrowableStringArray without checking if it is for the current document. Is this correct? In the alternate case where position info is available, there is a loop which discards any terms which are not for the current document:
int num = dpe.advance(docNum); if (num != docNum) { // either greater than or NO_MORE_DOCS continue; // no data for this term in this doc }
Shouldn't both cases do the same thing?
In my code (as there is no DocsAndPositionsEnum) I use the DocsEnum instead to cross reference against the current document:
` DocsEnum docsEnum = termsEnum.docs(null, null);
if (docsEnum != null) {
int termDoc;
while ((termDoc = docsEnum.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
if (termDoc != docNum) {
continue; // this term is not for this document
}
GrowableStringArray gsa = (GrowableStringArray)
res.getReconstructedFields().get(fld);
if (gsa == null) {
gsa = new GrowableStringArray();
res.getReconstructedFields().put(fld, gsa);
}
String term = termsEnum.term().utf8ToString();
System.out.print("(Cross-referenced term " + term + " via doc " + termDoc + ") ");
gsa.append(0, "|", term);
}
}
`
Do you see what I mean? Or have I missed something?
Thanks
- Chris
Hi Chris! Thanks for reporting and paying attention to this old little patch. I think you are right in that we do need to check whether the term belongs to the target document. So in the notation of the relevant Lucene version we'do:
// first use bytesRef to extract the indexed term value
String docTerm = bytesRef.utf8ToString();
DocsAndPositionsEnum newDpe = te.docsAndPositions(live, dpe, 0);
DocsEnum docs = te.docs(live, dpe);
int docId;
boolean docFound = false;
while( (docId = docs.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
if (docId == docNum)
docFound = true;
}
if (docFound) {
// only then store the value in a GrowableStringArray
}
and I think your code should work. Can you test it for fields with and without positions? You can find a test case in #20 .
Hi Dmitry,
Great, I’ll take a look. Thanks
Chris
Hi Dmitry,
I have a question - in your original code when you wrote the term with no position info to the 0th slot in the growablestringarray, you then do this
// we are done. Move to the next field
break;
Why? Surely we can continue to loop through all of the terms in the field and also add them to the 0th slot? Why abandon at this stage?
Thanks,
- Chris
Hi Dmitry,
I attach a patch + test for your perusal. You'll have to remove the ".txt" suffix as it won't accept .patch files.
Unstored_Field_With_Missing_Posns_Reconstruction.patch.txt
Kind regards,
- Chris
Hi Dmitry,
I have an unrelated Luke question, hope you don’t mind.
Since upgrading my Mac to Mojave, the fonts in Luke are terrible and I don’t know how to fix it. Settings -> Display font ... reveals a huge list of equally horrible fonts.
I’m pretty sure it didn’t used to be this bad. Any ideas? I have googled a lot but have not found anything.
I have been using the 4.10.4 and 5.5.5 versions of Luke.
Thanks
Chris
Sent from my iPhone
On 16 Nov 2019, at 14:46, Dmitry Kan [email protected] wrote:
Hi Chris! Thanks for reporting and paying attention to this old little patch. I think you are right in that we do need to check whether the term belongs to the target document. So in the notation of the relevant Lucene version we'do:
// first use bytesRef to extract the indexed term value String docTerm = bytesRef.utf8ToString(); DocsAndPositionsEnum newDpe = te.docsAndPositions(live, dpe, 0); DocsEnum docs = te.docs(live, dpe); int docId; boolean docFound = false; while( (docId = docs.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { if (docId == docNum) docFound = true; } if (docFound) { // only then store the value in a GrowableStringArray }and I think your code should work. Can you test it for fields with and without positions? You can find a test case in #20 .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Hi Dmitry, I have a question - in your original code when you wrote the term with no position info to the 0th slot in the growablestringarray, you then do this
// we are done. Move to the next fieldbreak;Why? Surely we can continue to loop through all of the terms in the field and also add them to the 0th slot? Why abandon at this stage?Thanks,
- Chris
I wish I had left more detail in the comments as to why I did that way. May be it was optimizing for some special case. At this point, re-reading the code I would agree with you: iterate and append all. Only thing is, what if the field's length is beyond capacity of the UI to show it? May be some trimming is necessary.
Hi Dmitry,
I attach a patch + test for your perusal. You'll have to remove the ".txt" suffix as it won't accept .patch files.
Unstored_Field_With_Missing_Posns_Reconstruction.patch.txt
Kind regards,
- Chris
thanks, Chris! Actually, can you submit it as a pull request?
Hi Dmitry, I have an unrelated Luke question, hope you don’t mind. Since upgrading my Mac to Mojave, the fonts in Luke are terrible and I don’t know how to fix it. Settings -> Display font ... reveals a huge list of equally horrible fonts. I’m pretty sure it didn’t used to be this bad. Any ideas? I have googled a lot but have not found anything. I have been using the 4.10.4 and 5.5.5 versions of Luke. Thanks Chris
Is there a way for you to use the Java FX or, better, Swing based Luke?
Hi Dmitry
Please bear with me, I have not done this before (pull request). I have created my own repo, created a local branch and applied the patch to it, committed and pushed. Now I am unsure what to do:
https://github.com/chris-bamford/luke
Can you take a look and advise please?
Thanks
- Chris
I see you forked from this repo -- this is a good start. Next, go to the github ui, switch to your branch, and click New pull request. What was the base branch for your branch?
I clicked new pull request and it says "There isn’t anything to compare. DmitryKey:master and chris-bamford:issue/175-improve-reconstruction-unstored-fields are entirely different commit histories." It shows my diffs (unified or split), but that's it, no way to confirm or proceed.
In answer to your question, is this what you mean:

Sorry for my inexperience.
hi Chris,
no problem. I compared to different branches and may be you created your branch off of luke-thinlet branch?
https://github.com/DmitryKey/luke/compare/luke-thinlet...chris-bamford:issue/175-improve-reconstruction-unstored-fields
I honestly cannot remember. I think it would be better if you sent me step-by-step instructions so I can learn! Thanks
hey @chris-bamford , from what it looks, you branched from the luke-thinlet. So after you click New pull request, instead of master, choose luke-thinlet and view the differences. At that point you should only see your own code differences and no conflicts.
Hi, yes. You’re right. You should have the pull request now?
Sent from my iPhone
On 28 Nov 2019, at 08:55, Dmitry Kan [email protected] wrote:
hi Chris,
no problem. I compared to different branches and may be you created your branch off of luke-thinlet branch?
luke-thinlet...chris-bamford:issue/175-improve-reconstruction-unstored-fields
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Hi Dmitry,
Was the pull request ok?
Best, Chris
Sent from my iPhone
On 3 Dec 2019, at 12:29, Chris Bamford [email protected] wrote:
Hi, yes. You’re right. You should have the pull request now?
Sent from my iPhone
On 28 Nov 2019, at 08:55, Dmitry Kan [email protected] wrote:
hi Chris,
no problem. I compared to different branches and may be you created your branch off of luke-thinlet branch?
luke-thinlet...chris-bamford:issue/175-improve-reconstruction-unstored-fields
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Hi Chris,
Thanks for your PR. Been travelling, could not respond earlier.
I left a comment in https://github.com/DmitryKey/luke/pull/177