staticSearch
staticSearch copied to clipboard
citations are indexed invisibly — `<cite>` is dropped from context field
[This may be a bug. At least, I do not think it is the result of a mistake I have made, but I have been wrong about that before. :-]
The content of <html:cite>
is dropped from the context created for each search term.
To reproduce:
- Download & expand 1.4.7.
- Add file cite_me_not.html to the test/ directory (file can be found in the appendix of this post).
- Issue
ant
. - Issue
fgrep -h 'situational' test/ssTest/stems/*
(or otherwise look at the results), and you will notice that the word “citation” does not occur in the output "context": field (it should be in that space before the comma). - Issue
cat test/ssTest/stems/citat*
, and notice that the word “citation” has no context around it.
Appendix — cite_me_not.html
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Theft 000382</title>
<meta name="article type" class="staticSearch_desc" content="test" />
<meta name="date of publication" class="staticSearch_date" content="2024-05-20" />
<meta name="volume" class="staticSearch_num" content="18" />
<meta name="issue" class="staticSearch_num" content="5" />
<meta name="docTitle" class="staticSearch_docTitle" content="Theft 000382" />
<script type="text/javascript" src="../../../uvepss/ssHighlight.js"></script>
</head>
<body>
<div id="mainContent">
This is a division with one firm
<a href="https://bauman.zapto.org/~syd/temp/pics/some_nice_shots_with_50_mm/index.html">anchor</a>,
one situational <cite>citation</cite>, one empirically
<em>emphatic</em> phrase, and <span>22.86 cm</span> worth
of nonsense.
<p>
This is a paragraph with a second firm
<a href="https://bauman.zapto.org/~syd/temp/pics/2024-04-11_car_fire_4_press/index.html">anchor</a>,
another situational <cite>citation</cite>, an even more empirically
<em>emphatic</em> phrase, and <span>⅛ fathom</span> worth
of nonsense.
</p>
</div>
</body>
</html>