Documenter.jl icon indicating copy to clipboard operation
Documenter.jl copied to clipboard

Remove at-block source code from search index

Open mortenpi opened this issue 3 years ago • 10 comments

It looks like that for the at-blocks, we just write the contents of the original code block to the search index.

A case in point: we can "find" a bunch of spurious DocTestSetups in the Julia manual (https://docs.julialang.org/en/v1.8.1/search/?q=doctestsetup) which do not appear in the HTML, due to at-meta blocks on those pages.

mortenpi avatar Sep 14 '22 03:09 mortenpi

I see two potential approaches to fixing this issue: 1) completely exclude all at-blocks from the search index, or 2) exclude only the input code while keeping the output in the search index. The first is simpler to implement but might be too restrictive, while the second preserves potentially useful output content. Which approach would you prefer for this fix?

Rahban1 avatar Mar 29 '25 09:03 Rahban1

Option (2) would seem very much preferable

goerz avatar Mar 29 '25 10:03 goerz

We also shouldn't remove example code, or repl block code

asinghvi17 avatar Mar 29 '25 11:03 asinghvi17

Option (2) would seem very much preferable

that's what I thought, because people often search terms that are in repl and example code. On it!

Rahban1 avatar Mar 29 '25 11:03 Rahban1

I've been working on this issue, I used the approach where I add custom SearchRecord methods in HTMLWriter.jl for @repl, @eval and @example blocks so that it shouldn't index the first children as it is the input but it is not working when I tried testing it locally, I am doing something like this


# for @example blocks
function SearchRecord(ctx::HTMLContext, navnode::Documenter.NavNode, node::MarkdownAST.Node{Nothing}, element::Documenter.MultiOutput)
    @info "MultiOutput SearchRecord method called!"
    children_array = collect(node.children)

    if isempty(children_array) || length(children_array) < 2
        return SearchRecord(ctx, navnode; text = "")
    end

    output_text = ""
    for child in children_array[2:end]
        if isa(child.element, Documenter.MultiOutputElement) && isa(child.element.element, Dict)
            for mime in [MIME"text/plain"(), MIME"text/markdown"(), MIME"text/html"()]
                if haskey(child.element.element, mime)
                    output_text *= string(child.element.element[mime]) * " "
                    break
                end
            end
        else 
            output_text *= mdflatten(child) * " "
        end
    end
    return SearchRecord(ctx, navnode; text = output_text)
end

# for @repl blocks

function SearchRecord(ctx::HTMLContext, navnode::Documenter.NavNode, node::MarkdownAST.Node{Nothing}, element::Documenter.MultiCodeBlock)
    @info "MultiCodeBlock SearchRecord method called!"
    children_array = collect(node.children)

    if isempty(children_array) || length(children_array) < 2
        return SearchRecord(ctx, navnode; text = "")
    end

    output_text = ""
    for i in 2:2:length(children_array)
        if i <= length(children_array)
            output_text *= mdflatten(children_array[i]) * " " 
        end
    end
    return SearchRecord(ctx, navnode; text = output_text)
end

# for @eval blocks
function SearchRecord(ctx, navnode, node::Node, element::Documenter.EvalNode)
    @info "EvalNode SearchRecord method called!"
    if isnothing(element.result)
        return SearchRecord(ctx, navnode; text = "")
    else
        return SearchRecord(ctx, navnode; text = mdflatten(element.result))
    end
end

Rahban1 avatar Mar 30 '25 18:03 Rahban1

I also tried something like this , this also is not working

function domify(dctx::DCtx)
    ctx, navnode = dctx.ctx, dctx.navnode
    return map(getpage(ctx, navnode).mdast.children) do node
        if node.element isa Documenter.MetaNode

        else
            rec = SearchRecord(ctx, navnode, node, node.element)
            push!(ctx.search_index, rec)
        end
        
        domify(dctx, node, node.element)
    end
end

at this point I am thinking that the search index might be including the rendered html and that is why these codes are not working

Rahban1 avatar Mar 30 '25 21:03 Rahban1

one more thing that I tried is filtering out before writing to search_index but this seems like too broad of a filtering :

open(joinpath(doc.user.build, ctx.search_index_js), "w") do io
        filtered_index = filter(ctx.search_index) do rec
        
        if contains(rec.text, "```julia") || contains(rec.text, "```@example") || 
           contains(rec.text, "```@repl") || contains(rec.text, "```@eval")
            return false
        end
        
        if contains(rec.text, r"julia>\s")
            return false
        end
        
        if contains(rec.text, "DocTestSetup") || contains(rec.text, "@meta")
            return false
        end
        
        return true
    end
    
    println(io, "var documenterSearchIndex = {\"docs\":")
    println(io, JSDependencies.json_jsescape(filtered_index), "\n}")
    end

Rahban1 avatar Mar 30 '25 22:03 Rahban1

search index might be including the rendered html and that is why these codes are not working

This is not the case, unless we are somehow pushing HTML into ctx.search_index as SearchRecords. As I think you have identified, the search index is just a JSON representation of ctx.search_index, which we write out here:

https://github.com/JuliaDocs/Documenter.jl/blob/dc0f43375ace0b307f138ccd36f376d3b88290e3/src/html/HTMLWriter.jl#L825

And the SearchRecord -> JSON is very straightforward and happens here:

https://github.com/JuliaDocs/Documenter.jl/blob/dc0f43375ace0b307f138ccd36f376d3b88290e3/src/html/HTMLWriter.jl#L716-L727

All in all, we definitely want to keep the rule that whatever is in ctx.search_index is what gets written into search_index.js, so the correct solution here would involved changing either the SearchRecord methods or maybe the domify methods, as you've done.

As for why those attemps are not working -- could you open a draft PR, where your changes are incorporated into the source code, so I could easily run it?

mortenpi avatar Mar 30 '25 22:03 mortenpi

This can still be reproduce at https://docs.julialang.org/en/v1.12/?q=DocTestSetups

But to my surprise it is not present at https://docs.julialang.org/en/v1.13-dev/?q=DocTestSetups ? Any idea why?

fingolfin avatar Oct 28 '25 12:10 fingolfin

Turns out the Julia 1.12 docs were built with Documenter 1.8.1 while the 1.13 ones with 1.11.4

So perhaps this issue is fixed?

Clearly we need Julia 1.12 to use a newer Documenter! See https://github.com/JuliaLang/julia/pull/60073

fingolfin avatar Nov 07 '25 19:11 fingolfin