staticSearch icon indicating copy to clipboard operation
staticSearch copied to clipboard

citations are indexed invisibly — `<cite>` is dropped from context field

Open sydb opened this issue 9 months ago • 5 comments

[This may be a bug. At least, I do not think it is the result of a mistake I have made, but I have been wrong about that before. :-]

The content of <html:cite> is dropped from the context created for each search term.

To reproduce:

  1. Download & expand 1.4.7.
  2. Add file cite_me_not.html to the test/ directory (file can be found in the appendix of this post).
  3. Issue ant.
  4. Issue fgrep -h 'situational' test/ssTest/stems/* (or otherwise look at the results), and you will notice that the word “citation” does not occur in the output "context": field (it should be in that space before the comma).
  5. Issue cat test/ssTest/stems/citat*, and notice that the word “citation” has no context around it.

Appendix — cite_me_not.html

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
      <title>Theft 000382</title>
      <meta name="article type" class="staticSearch_desc" content="test" />
      <meta name="date of publication" class="staticSearch_date" content="2024-05-20" />
      <meta name="volume" class="staticSearch_num" content="18" />
      <meta name="issue" class="staticSearch_num" content="5" />
      <meta name="docTitle" class="staticSearch_docTitle" content="Theft 000382" />
      <script type="text/javascript" src="../../../uvepss/ssHighlight.js"></script>
   </head>
   <body>
     <div id="mainContent">
       This is a division with one firm
       <a href="https://bauman.zapto.org/~syd/temp/pics/some_nice_shots_with_50_mm/index.html">anchor</a>,
       one situational <cite>citation</cite>, one empirically
       <em>emphatic</em> phrase, and <span>22.86 cm</span> worth
       of nonsense.
       <p>
         This is a paragraph with a second firm
         <a href="https://bauman.zapto.org/~syd/temp/pics/2024-04-11_car_fire_4_press/index.html">anchor</a>,
         another situational <cite>citation</cite>, an even more empirically
         <em>emphatic</em> phrase, and <span>⅛ fathom</span> worth
         of nonsense.
       </p>
     </div>
   </body>
</html>

sydb avatar May 21 '24 02:05 sydb