Remove at-block source code from search index
It looks like that for the at-blocks, we just write the contents of the original code block to the search index.
A case in point: we can "find" a bunch of spurious DocTestSetups in the Julia manual (https://docs.julialang.org/en/v1.8.1/search/?q=doctestsetup) which do not appear in the HTML, due to at-meta blocks on those pages.
I see two potential approaches to fixing this issue: 1) completely exclude all at-blocks from the search index, or 2) exclude only the input code while keeping the output in the search index. The first is simpler to implement but might be too restrictive, while the second preserves potentially useful output content. Which approach would you prefer for this fix?
Option (2) would seem very much preferable
We also shouldn't remove example code, or repl block code
Option (2) would seem very much preferable
that's what I thought, because people often search terms that are in repl and example code. On it!
I've been working on this issue, I used the approach where I add custom SearchRecord methods in HTMLWriter.jl for @repl, @eval and @example blocks so that it shouldn't index the first children as it is the input but it is not working when I tried testing it locally, I am doing something like this
# for @example blocks
function SearchRecord(ctx::HTMLContext, navnode::Documenter.NavNode, node::MarkdownAST.Node{Nothing}, element::Documenter.MultiOutput)
@info "MultiOutput SearchRecord method called!"
children_array = collect(node.children)
if isempty(children_array) || length(children_array) < 2
return SearchRecord(ctx, navnode; text = "")
end
output_text = ""
for child in children_array[2:end]
if isa(child.element, Documenter.MultiOutputElement) && isa(child.element.element, Dict)
for mime in [MIME"text/plain"(), MIME"text/markdown"(), MIME"text/html"()]
if haskey(child.element.element, mime)
output_text *= string(child.element.element[mime]) * " "
break
end
end
else
output_text *= mdflatten(child) * " "
end
end
return SearchRecord(ctx, navnode; text = output_text)
end
# for @repl blocks
function SearchRecord(ctx::HTMLContext, navnode::Documenter.NavNode, node::MarkdownAST.Node{Nothing}, element::Documenter.MultiCodeBlock)
@info "MultiCodeBlock SearchRecord method called!"
children_array = collect(node.children)
if isempty(children_array) || length(children_array) < 2
return SearchRecord(ctx, navnode; text = "")
end
output_text = ""
for i in 2:2:length(children_array)
if i <= length(children_array)
output_text *= mdflatten(children_array[i]) * " "
end
end
return SearchRecord(ctx, navnode; text = output_text)
end
# for @eval blocks
function SearchRecord(ctx, navnode, node::Node, element::Documenter.EvalNode)
@info "EvalNode SearchRecord method called!"
if isnothing(element.result)
return SearchRecord(ctx, navnode; text = "")
else
return SearchRecord(ctx, navnode; text = mdflatten(element.result))
end
end
I also tried something like this , this also is not working
function domify(dctx::DCtx)
ctx, navnode = dctx.ctx, dctx.navnode
return map(getpage(ctx, navnode).mdast.children) do node
if node.element isa Documenter.MetaNode
else
rec = SearchRecord(ctx, navnode, node, node.element)
push!(ctx.search_index, rec)
end
domify(dctx, node, node.element)
end
end
at this point I am thinking that the search index might be including the rendered html and that is why these codes are not working
one more thing that I tried is filtering out before writing to search_index but this seems like too broad of a filtering :
open(joinpath(doc.user.build, ctx.search_index_js), "w") do io
filtered_index = filter(ctx.search_index) do rec
if contains(rec.text, "```julia") || contains(rec.text, "```@example") ||
contains(rec.text, "```@repl") || contains(rec.text, "```@eval")
return false
end
if contains(rec.text, r"julia>\s")
return false
end
if contains(rec.text, "DocTestSetup") || contains(rec.text, "@meta")
return false
end
return true
end
println(io, "var documenterSearchIndex = {\"docs\":")
println(io, JSDependencies.json_jsescape(filtered_index), "\n}")
end
search index might be including the rendered html and that is why these codes are not working
This is not the case, unless we are somehow pushing HTML into ctx.search_index as SearchRecords. As I think you have identified, the search index is just a JSON representation of ctx.search_index, which we write out here:
https://github.com/JuliaDocs/Documenter.jl/blob/dc0f43375ace0b307f138ccd36f376d3b88290e3/src/html/HTMLWriter.jl#L825
And the SearchRecord -> JSON is very straightforward and happens here:
https://github.com/JuliaDocs/Documenter.jl/blob/dc0f43375ace0b307f138ccd36f376d3b88290e3/src/html/HTMLWriter.jl#L716-L727
All in all, we definitely want to keep the rule that whatever is in ctx.search_index is what gets written into search_index.js, so the correct solution here would involved changing either the SearchRecord methods or maybe the domify methods, as you've done.
As for why those attemps are not working -- could you open a draft PR, where your changes are incorporated into the source code, so I could easily run it?
This can still be reproduce at https://docs.julialang.org/en/v1.12/?q=DocTestSetups
But to my surprise it is not present at https://docs.julialang.org/en/v1.13-dev/?q=DocTestSetups ? Any idea why?
Turns out the Julia 1.12 docs were built with Documenter 1.8.1 while the 1.13 ones with 1.11.4
So perhaps this issue is fixed?
Clearly we need Julia 1.12 to use a newer Documenter! See https://github.com/JuliaLang/julia/pull/60073