luke Question about luke-luke-4.10.4-field-reconstruction

Hi, I am opening an issue as I can't find another way to ask Dmitry a question - sorry if this is the wrong approach.

Before I realised that you had made this patch I wrote my own code to do the same thing, which I am still experimenting with. Then I found your patch and compared the code, specifically the part which reconstructs the data when there is no position information.

I note that (IIUC) your new code takes the first term and adds it to the GrowableStringArray without checking if it is for the current document. Is this correct? In the alternate case where position info is available, there is a loop which discards any terms which are not for the current document:

int num = dpe.advance(docNum); if (num != docNum) { // either greater than or NO_MORE_DOCS continue; // no data for this term in this doc }

Shouldn't both cases do the same thing?

In my code (as there is no DocsAndPositionsEnum) I use the DocsEnum instead to cross reference against the current document:

` DocsEnum docsEnum = termsEnum.docs(null, null);

if (docsEnum != null) {
    int termDoc;
    while ((termDoc = docsEnum.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
        if (termDoc != docNum) {
            continue;  // this term is not for this document
        }

        GrowableStringArray gsa = (GrowableStringArray)
                res.getReconstructedFields().get(fld);
        if (gsa == null) {
            gsa = new GrowableStringArray();
            res.getReconstructedFields().put(fld, gsa);
        }
        String term = termsEnum.term().utf8ToString();
        System.out.print("(Cross-referenced term " + term + " via doc " + termDoc + ") ");
        gsa.append(0, "|", term);
    }
}

`

Do you see what I mean? Or have I missed something?

Thanks

Chris

Nov 15 '19 17:11 chris-bamford

Hi Chris! Thanks for reporting and paying attention to this old little patch. I think you are right in that we do need to check whether the term belongs to the target document. So in the notation of the relevant Lucene version we'do:

      // first use bytesRef to extract the indexed term value
      String docTerm = bytesRef.utf8ToString();

      DocsAndPositionsEnum newDpe = te.docsAndPositions(live, dpe, 0);
      DocsEnum docs = te.docs(live, dpe);
      int docId;
      boolean docFound = false;
      while( (docId = docs.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
      if (docId == docNum)
          docFound = true;
      }
      if (docFound) {
          // only then store the value in a GrowableStringArray
      }

and I think your code should work. Can you test it for fields with and without positions? You can find a test case in #20 .

Nov 16 '19 14:11 DmitryKey

Hi Dmitry,

Great, I’ll take a look. Thanks

Chris

Nov 16 '19 21:11 chris-bamford

Hi Dmitry, I have a question - in your original code when you wrote the term with no position info to the 0th slot in the growablestringarray, you then do this // we are done. Move to the next field break; Why? Surely we can continue to loop through all of the terms in the field and also add them to the 0th slot? Why abandon at this stage?

Thanks,

Chris

Nov 18 '19 12:11 chris-bamford

Hi Dmitry,

I attach a patch + test for your perusal. You'll have to remove the ".txt" suffix as it won't accept .patch files.

Unstored_Field_With_Missing_Posns_Reconstruction.patch.txt

Kind regards,

Chris

Nov 19 '19 13:11 chris-bamford

Hi Dmitry,

I have an unrelated Luke question, hope you don’t mind.

Since upgrading my Mac to Mojave, the fonts in Luke are terrible and I don’t know how to fix it. Settings -> Display font ... reveals a huge list of equally horrible fonts.

I’m pretty sure it didn’t used to be this bad. Any ideas? I have googled a lot but have not found anything.

I have been using the 4.10.4 and 5.5.5 versions of Luke.

Thanks

Chris

Sent from my iPhone

On 16 Nov 2019, at 14:46, Dmitry Kan [email protected] wrote:

Hi Chris! Thanks for reporting and paying attention to this old little patch. I think you are right in that we do need to check whether the term belongs to the target document. So in the notation of the relevant Lucene version we'do:
  // first use bytesRef to extract the indexed term value
  String docTerm = bytesRef.utf8ToString();

  DocsAndPositionsEnum newDpe = te.docsAndPositions(live, dpe, 0);
  DocsEnum docs = te.docs(live, dpe);
  int docId;
  boolean docFound = false;
  while( (docId = docs.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
  if (docId == docNum)
      docFound = true;
  }
  if (docFound) {
      // only then store the value in a GrowableStringArray
  }
and I think your code should work. Can you test it for fields with and without positions? You can find a test case in #20 .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Nov 21 '19 14:11 chris-bamford

Hi Dmitry, I have a question - in your original code when you wrote the term with no position info to the 0th slot in the growablestringarray, you then do this // we are done. Move to the next field break; Why? Surely we can continue to loop through all of the terms in the field and also add them to the 0th slot? Why abandon at this stage?

Thanks,

Chris

I wish I had left more detail in the comments as to why I did that way. May be it was optimizing for some special case. At this point, re-reading the code I would agree with you: iterate and append all. Only thing is, what if the field's length is beyond capacity of the UI to show it? May be some trimming is necessary.

Nov 25 '19 13:11 DmitryKey

Hi Dmitry,

I attach a patch + test for your perusal. You'll have to remove the ".txt" suffix as it won't accept .patch files.

Unstored_Field_With_Missing_Posns_Reconstruction.patch.txt

Kind regards,

Chris

thanks, Chris! Actually, can you submit it as a pull request?

Nov 25 '19 13:11 DmitryKey

Hi Dmitry, I have an unrelated Luke question, hope you don’t mind. Since upgrading my Mac to Mojave, the fonts in Luke are terrible and I don’t know how to fix it. Settings -> Display font ... reveals a huge list of equally horrible fonts. I’m pretty sure it didn’t used to be this bad. Any ideas? I have googled a lot but have not found anything. I have been using the 4.10.4 and 5.5.5 versions of Luke. Thanks Chris

Is there a way for you to use the Java FX or, better, Swing based Luke?

Nov 25 '19 13:11 DmitryKey

Hi Dmitry

Please bear with me, I have not done this before (pull request). I have created my own repo, created a local branch and applied the patch to it, committed and pushed. Now I am unsure what to do:

https://github.com/chris-bamford/luke

Can you take a look and advise please?

Thanks

Chris

Nov 25 '19 16:11 chris-bamford

I see you forked from this repo -- this is a good start. Next, go to the github ui, switch to your branch, and click New pull request. What was the base branch for your branch?

Nov 25 '19 20:11 DmitryKey

I clicked new pull request and it says "There isn’t anything to compare. DmitryKey:master and chris-bamford:issue/175-improve-reconstruction-unstored-fields are entirely different commit histories." It shows my diffs (unified or split), but that's it, no way to confirm or proceed.

In answer to your question, is this what you mean: Screenshot 2019-11-26 at 10 11 01

Sorry for my inexperience.

Nov 26 '19 10:11 chris-bamford

hi Chris,

no problem. I compared to different branches and may be you created your branch off of luke-thinlet branch?

https://github.com/DmitryKey/luke/compare/luke-thinlet...chris-bamford:issue/175-improve-reconstruction-unstored-fields

Nov 28 '19 08:11 DmitryKey

I honestly cannot remember. I think it would be better if you sent me step-by-step instructions so I can learn! Thanks

Nov 28 '19 09:11 chris-bamford

hey @chris-bamford , from what it looks, you branched from the luke-thinlet. So after you click New pull request, instead of master, choose luke-thinlet and view the differences. At that point you should only see your own code differences and no conflicts.

Nov 30 '19 08:11 DmitryKey

Hi, yes. You’re right. You should have the pull request now?

Sent from my iPhone

On 28 Nov 2019, at 08:55, Dmitry Kan [email protected] wrote:

hi Chris,

no problem. I compared to different branches and may be you created your branch off of luke-thinlet branch?

luke-thinlet...chris-bamford:issue/175-improve-reconstruction-unstored-fields

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Dec 03 '19 12:12 chris-bamford

Hi Dmitry,

Was the pull request ok?

Best, Chris

Sent from my iPhone

On 3 Dec 2019, at 12:29, Chris Bamford [email protected] wrote:

Hi, yes. You’re right. You should have the pull request now?

Sent from my iPhone

On 28 Nov 2019, at 08:55, Dmitry Kan [email protected] wrote:

hi Chris,

no problem. I compared to different branches and may be you created your branch off of luke-thinlet branch?

luke-thinlet...chris-bamford:issue/175-improve-reconstruction-unstored-fields

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Dec 11 '19 11:12 chris-bamford

Hi Chris,

Thanks for your PR. Been travelling, could not respond earlier.

I left a comment in https://github.com/DmitryKey/luke/pull/177

Dec 11 '19 12:12 DmitryKey

luke luke copied to clipboard

Question about luke-luke-4.10.4-field-reconstruction

luke
luke copied to clipboard