julia icon indicating copy to clipboard operation
julia copied to clipboard

REPL: warn on non-owning qualified accesses

Open ericphanson opened this issue 1 year ago • 7 comments

  • Accessing names from other modules can be dangerous, because those names may only be in that module by happenstance, e.g. a name exported by Base
  • the keyword public encourages more qualified accesses, increasing the risk of accidentally accessing a name that isn't part of that module on purpose
  • ExplicitImports.jl can catch this in a package context, but this requires opting in. Folks who might not be writing packages or might not be using dev tooling like ExplicitImports can easily get caught off guard by these accesses (and might be less familiar with the issue than package developers)
  • using a REPL AST transform we can emit warnings when we notice this happening in the REPL
the AST transform
function print_qualified_access_warning(mod, owner, name)
    @warn string(name, " is defined in ", owner, " and is not public in ", mod) maxlog = 1 _id = string("repl_warning", mod, name) _line = nothing _file = nothing _module = nothing
end

function maybe_print_qualified_access_warning(mod, owner, name)
    mod isa Module || return
    Base.ispublic(mod, name) && return
    mod === Base && Base.ispublic(Core, name) && return
    print_qualified_access_warning(mod, owner, name)
    return
end

function has_ancestor(query, target)
    query == target && return true
    while true
        next = parentmodule(query)
        next == target && return true
        next == query && return false
        query = next
    end
end

function add_qualified_access_warning(ast)
    ast isa Expr || return ast

    # don't recurse through module definitions
    ast.head === :module && return ast

    if ast.head == Symbol(".") && length(ast.args) == 2
        mod_name = ast.args[1]
        name_being_accessed = ast.args[2]
        if name_being_accessed isa QuoteNode
            name_being_accessed = name_being_accessed.value
        end
        name_being_accessed isa Symbol || return ast
        mod = try
            getproperty(REPL.active_module(), mod_name)
        catch
            return ast
        end
        owner = try
            which(mod, name_being_accessed)
        catch
            return ast
        end
        has_ancestor(owner, mod) && return ast
        maybe_print_qualified_access_warning(mod, owner, name_being_accessed)
    else
        for arg in ast.args
            add_qualified_access_warning(arg)
        end
    end
    return ast
end

using REPL
pushfirst!(Base.active_repl_backend.ast_transforms, add_qualified_access_warning)

before:

julia> NNlib.VERSION
v"1.11.0-beta1"

after:

julia> NNlib.VERSION
[ Warning: VERSION is defined in Base and is not public in NNlib
v"1.11.0-beta1"

as pointed out by @adrhill on Slack, this particular access

[is] a "false friend" for Python programmers who are used to PEP 8's pythonpackage.__version__ convention

but there's plenty of other confusing accesses one can make too, e.g.

julia> JSON.parse
parse (generic function with 2 methods)

julia> JSON.tryparse
[ Warning: tryparse is defined in Base and is not public in JSON
tryparse (generic function with 17 methods)

Notes:

  • this could be made "stricter", e.g. warning on all non-public accesses. I am not sure those are worth a warning at this point; I think these non-owning access are the most problematic and we should avoid being too noisy.

  • this doesn't recurse into modules to avoid needing to deal with namespaces as much. That is, if you enter module X; using JSON; JSON.tryparse; end there is no warning. I think that's fine? It is a very big lift otherwise, since the module isn't even defined yet, so we have to parse it to understand what names are present etc

  • ~this doesn't recurse into ., ie. JSON.Parser.tryparse doesn't warn. I think this is likely worth supporting, but it could be a followup PR (or maybe I can add it here). It adds a bit of complexity since we need to trace through and keep track of what namespace we're accessing from.~ this is now done

  • I rely on maxlog=1 to avoid repeating the warning. The default logger supports this, but perhaps using a very barebones logger might not. To me that seems uncommon in an interactive REPL context. Additionally, since this works at "parse" time (i.e. we emit the warning when inspecting the AST, we don't inject it into the AST) it can emit at most 1 warning per written-down qualified access in the code (not once per iteration of a loop or such), so it is limited in how often it can warn.

  • this isn't perfect since we just pattern-match qualified accesses in the block of code entered in the REPL, so we can have false positives:

    julia> let JSON=(; tryparse=1)
               JSON.tryparse
           end
    [ Warning: tryparse is defined in Base and is not public in JSON
    1
    

    [edit: this case no longer warns]

    To avoid all false positives, we likely need to lower the code, then introspect after lowering, so that the scoping is right etc. This would probably be easy with JuliaLowering? I think it is probably hard now, but maybe an expert knows better. In ExplicitImports.jl I parse the code into a JuliaSyntax tree, then very hackily reimplement parts of lowering on top of that to attempt to resolve scope manually, but it isn't perfect either.

ericphanson avatar Jun 20 '24 22:06 ericphanson

I just pushed a commit that fixes some of the issues in the original implementation: now nested accesses like JSON.Parser.tryparse will correctly warn, and many cases of false positives like

julia> let JSON=(; tryparse=1)
           JSON.tryparse
       end

no longer warn. I also added tests and NEWS.

I also ran the transformation on all the code I could find in Base, to ensure it would not error and to see what I find. I got:


using REPL
function parse_all(file)
    text = read(file, String)
    pos = 1
    exs = []
    while true
    ex, pos = Meta.parse(text::AbstractString, pos;
        filename=basename(file), raise=false, depwarn=false)
        push!(exs, ex)
        Meta.isexpr(ex, :error) && return exs
        pos > ncodeunits(text) && return exs
    end
    return exs
end

function warn_all(path)
    n_exprs = 0
    for (root, dirs, filenames) in walkdir(path)
        for filename in filenames
            if endswith(filename, ".jl")
                exs = parse_all(joinpath(root, filename))
                n_exprs += length(exs)
                foreach(REPL.warn_on_non_owning_accesses, exs)
            end
        end
    end
    return n_exprs
end

yields

julia> warn_all(".")
[ Warning: Pairs is defined in Base and is not public in Base.Iterators
[ Warning: ReturnNode is defined in Core and is not public in Core.Compiler
[ Warning: not_int is defined in Core.Intrinsics and is not public in Core.Compiler
[ Warning: eval is defined in Core and is not public in Base.MainInclude
[ Warning: Const is defined in Core and is not public in Core.Compiler
[ Warning: @__doc__ is defined in Core and is not public in Base
[ Warning: ReentrantLock is defined in Base and is not public in Base.Threads
[ Warning: Binding is defined in Base.Docs and is not public in REPL
[ Warning: isvalid is defined in Base and is not public in Base.Unicode
[ Warning: IndexCartesian is defined in Base and is not public in Base.IteratorsMD
[ Warning: SimpleVector is defined in Core and is not public in Base
[ Warning: systemerror is defined in Base and is not public in Base.Libc
[ Warning: donotdelete is defined in Core and is not public in Base
[ Warning: not_int is defined in Core.Intrinsics and is not public in Base
[ Warning: @inline is defined in Base and is not public in Base.Iterators
[ Warning: HasEltype is defined in Base and is not public in Base.Iterators
[ Warning: IteratorEltype is defined in Base and is not public in Base.Iterators
[ Warning: llvmcall is defined in Core.Intrinsics and is not public in Base
[ Warning: have_fma is defined in Core.Intrinsics and is not public in Core.Compiler
[ Warning: fma_float is defined in Core.Intrinsics and is not public in Base
[ Warning: PrecompilableError is defined in Core and is not public in Base
[ Warning: significand_bits is defined in Base and is not public in Base.Math
[ Warning: xor_int is defined in Core.Intrinsics and is not public in Base
[ Warning: SSAValue is defined in Core and is not public in Core.Compiler
[ Warning: GotoIfNot is defined in Core and is not public in Core.Compiler
[ Warning: PhiNode is defined in Core and is not public in Core.Compiler
[ Warning: UpsilonNode is defined in Core and is not public in Core.Compiler
[ Warning: PhiCNode is defined in Core and is not public in Core.Compiler
[ Warning: Argument is defined in Core and is not public in Core.Compiler
[ Warning: lshr_int is defined in Core.Intrinsics and is not public in Base
[ Warning: ashr_int is defined in Core.Intrinsics and is not public in Base
[ Warning: shl_int is defined in Core.Intrinsics and is not public in Base
[ Warning: pointerset is defined in Core.Intrinsics and is not public in Base
[ Warning: InterConditional is defined in Core and is not public in Core.Compiler
[ Warning: PartialStruct is defined in Core and is not public in Core.Compiler
[ Warning: pointerref is defined in Core.Intrinsics and is not public in Base
[ Warning: sitofp is defined in Core.Intrinsics and is not public in Base
[ Warning: GotoNode is defined in Core and is not public in Core.Compiler
[ Warning: require_one_based_indexing is defined in Base and is not public in LinearAlgebra
[ Warning: donotdelete is defined in Core and is not public in Core.Compiler
[ Warning: or_int is defined in Core.Intrinsics and is not public in Base
[ Warning: memoryrefget is defined in Core and is not public in Base
120434

So we transformed 120k expressions and found several accesses. A lot of these seem mostly OK to me so I wonder if we should have a further relaxation about accessing names from Core or a submodule of Core in Base?

ericphanson avatar Jun 23 '24 16:06 ericphanson

BTW, there are cases where indirect references are made in order to avoid direct dependencies on upstream packages. E.g.;

julia> using ColorTypes

julia> using ColorTypes.FixedPointNumbers
[ Warning: FixedPointNumbers is defined in FixedPointNumbers and is not public in ColorTypes

It is a justified warning (in terms of "potential" compatibility risk), but the cases are usually intentional.

Another case where such references may be made is when applying monkey patches. Even though the method overwriting would have a greater impact, by default it is not warned.

kimikage avatar Jun 25 '24 05:06 kimikage

This is just in the REPL so getting a warning in those intentional cases is perhaps not the end of the world.

KristofferC avatar Jun 25 '24 05:06 KristofferC

That is right, but my concern is whether we need to be concerned about potential future changes in the REPL. To be honest, I do not know of any practical case other than the name exported by Base.

kimikage avatar Jun 25 '24 06:06 kimikage

my concern is whether we need to be concerned about potential future changes in the REPL

Can you explain your concern further? I don’t think I understand. Is it that this PR creates too many warnings when this access is performed intentionally? Or that this is a “slippery-slope” to future changes creating more warnings? Or something else?

ericphanson avatar Jun 25 '24 10:06 ericphanson

In the first place, I am not convinced of the dangers of:

Accessing names from other modules can be dangerous, because those names may only be in that module by happenstance, e.g. a name exported by Base

Using non-public functions or constants is problematic in that they are not robust to future changes, but there should be no danger in the code running in the REPL. The original problem addressed by this PR is that end-users may refer to functions or constants that they do not intend. However, that is essentially a separate issue from the qualified accesses.

I agree that exported names by Base could be a potential pitfall. However, I think it is due to the modules using Base implicitly. I do not know of any other practical cases.

If this PR was a simple code addition, I would not mind so much. I simply wondered if the warning was worth making the codes so complicated (i.e., costly to maintain).

I am not opposed to merging this PR. No need to use the comments here just to convince me.

kimikage avatar Jun 25 '24 11:06 kimikage

Yeah, that makes sense. I think we could limit it to only warn when the owner is Base or Core; that would capture the original gripe and some of the most obvious cases while limiting any annoyance. In principle the same issue can occur with any package though. Perhaps this PR is something we could try as-is, then scale back if there's too many complaints?

We could also document how to remove it, by adding the following code to your config/startup.jl:

import REPL
if isdefined(REPL, :warn_on_non_owning_accesses) && REPL.warn_on_non_owning_accesses in REPL.repl_ast_transforms
    filter!(!=(REPL.warn_on_non_owning_accesses), REPL.repl_ast_transforms)
    if isdefined(Base, :active_repl_backend)
        filter!(!=(REPL.warn_on_non_owning_accesses), Base.active_repl_backend.ast_transforms)
    end
end

In terms of the complexity, I agree the code is a bit complex. It would be nice if it was simpler, but it's not a very natural operation, introspecting yet-to-be-executed code at the AST level and the existing runtime state to decide whether or not to emit a warning. I think though it's important this be a REPL feature as opposed to something integrated more deeply, and it's nice that it uses the existing ast transforms api rather than introducing new hooks, so at least all the complexity is only at the surface of the stack. It also could be made simpler and more precise once lowering is ported to Julia so we can access scope information.

ericphanson avatar Jun 25 '24 21:06 ericphanson

Is anyone up for reviewing? Maybe @IanButterworth since you made the pkg install hook?

ericphanson avatar Jul 07 '24 14:07 ericphanson