otp icon indicating copy to clipboard operation
otp copied to clipboard

Feature: internal_export

Open MarkoMin opened this issue 1 year ago • 12 comments

Internal export

What it is

internal_export is an attribute that gives you more control over function exporting. In some cases, functions are only exported for internal usage, for testing purposes or because you must implement behavior callbacks. Those functions are not meant to be used by the end user of your application and they shouldn't exists in public interface of a module. Using them is currently considered bad practice, but with this feature it can be integrated into testing pipelines.

Best description of this problem I've found is from dr. sc. Richard A. O'Keefe: "there are many “applications” in the Erlang/OTP system which consist of a flotilla of modules. A module in such a flotilla cannot export it to another member or other members of the flotilla without exporting it to absolutely every module"

By internal usage I mean used by some module that belongs to the application. Module belongs to application if it is listed in modules property in application resource file. Most of Erlang programmers use LSP which by default suggest all exported functions - even undocumented ones, functions documented with %% testing only and callbacks. The user can, on purpose or not, use those functions it won't get noticed by any tool. Also, user can in the same way use all undocumented modules - a quick script showed that over 66% of modules in kernel, stdlib, mnesia, ssh, ssl, crypto and inets are undocumented and therefore considered internal.

By combining internal_export attribute and xref analysis, it is possible to detect static calls to functions marked as internal_export and decide whether they are legal or not.

Motivation

Current state is that a function is either exported or not. Although very simple, I think this mechanism lacks some control over scoping those exports because of already mentioned reasons. I eventually started a thread some time ago and I'd say it received a good feedback. EEP05 was the main source of inspiration, but I changed some things in design (explained in the next chapter).

This PR is related only for static analysis, but I also plan to work on dynamic checks if the static part receives a good feedback.

Design & implementation

How much control do we want?

In EEP05, it is proposed that the user should manually specify the scope of a function export. That would allow fine-grained control over exporting a function, but would potentially bloat the source file and produce some amount of boilerplate. It would also require maintenance each time new modules are added. Ut would require developers to know more stuff about the dependencies they're using - e.g. if you want to internally export handle_call/3 - do you export it to gen_server which defines it or gen which calls it dynamically? Do you really need to know that information?

I propose a mechanism where you only declare functions that are not meant to be used by the user - callbacks and internal functions. Those functions will be available to all modules in your application and its dependencies. Dependencies of an application are applications listed in applications, included_applications or optional_applications in the application resource file including their own dependencies (e.g. if A depends on B and B depends on C then A also depends on C).

I didn't find the use-case where you want to export a function to a module/application that is not captured by the mechanism above. Counterexamples are welcome.

Exporting to modules or to applications?

There is an open question whether to export functions to modules or to applications. Exporting to modules is currently proposed in EEP05.

If we don't allow user to declare additional scope - then it doesn't really matter (at least for static analysis), it is just the matter of implementation. It is easier to implement it by exporting to applications. If we do allow user to declare additional scope, specifying applications would be easier to maintain. Otherwise, if you want to export functions internally for your application, you'd have to enumerate all modules in the application, which would be pain. Another example is exporting a callback - if dynamic checks ever get implemented, that would require you to know exactly which module is calling your function (e.g. you'd need to export handle_call/3 to gen which shouldn't be directly used, not gen_server). That could also cause backward incompatibilities if some internal changes occur in gen_server. User shouldn't rely on internal implementation.

Notice that it is possible to convert current approach to module-wise by just replacing applications with value of modules property from application resource file.

Considering dynamic call checking, tradeoff must be made. Application-wise approach would be significantly less memory hungry, but would eventually introduce more overhead to dynamic call checks because you'd need to map caller and callee modules to the applications they belong to and then decide whether the call is legal or not. At the moment I can't really estimate how big this would be, but I guess pretty high / unacceptable. On the other side, module-wise approach would be very memory hungry. It would take O(1) per MFA as suggested in EEP05, but due to the current mechanism, there would be a lot of unused entries that wouldn't be used even once, but would clog the VM.

For now, my best idea is to combine both - store information about applications, but cache the results. That would make large part overhead disappear from every-but-first dynamic check, but would introduce moderate memory consumption. Any suggestions about dynamic checks are very welcome.

Xref modifications

New predefined variable IX added - list of all functions tagged with both internal_export and export. It is a subset of already existing variable X and is available in both xref_modes. Two new analysis types are defined: internal_use and illegal_internal_use - ONLY AVAILABLE IN FUNCTIONS MODE (because they depend on predefined variable E).

  • internal_use - all usages of functions from IX
  • illegal_internal_use - all usages of functions from IX which are not allowed

Syntax

Syntax is simple:

-internal_export([f/a,...])

I considered shorter name - internal, but find it kinda confusing because we refer to internal functions in another way. Also, original name export_to was the option, but it's also confusing since we don't explicitly state to what we're exporting.

Usage

To test your applications you need to add them to xref server. After that, you can call request illegal_internal_use analysis to return a list of all calls to internal functions that aren't allowed

Pros and cons

Pros Cons
Safer codebase - you can detect calls to functions that shouldn't be used Added complexity and responsibility for developers
Spawn problem - I'd like to cite dr. Richard A. O’Keefe here: "if we wish to spawn a call to a function, that function must be exported even if we have no intention whatsoever of making it available to other modules" Overhead (only for dynamic checks)
Documentation - no more need to mark functions with %%hidden and modules with only internal exports can be excluded from documentation
LSP integration - internal functions not shown when consuming dependencies
CI/CD pipeline integration
Shell integration - Autocomplete could separate exports and internal exports
Loader integration - do the analysis in code loader

Summary

Even without dynamic checks, I think this is quite good feature to have because you can force developers not to rely on something they shouldn't. If proven useful, I plan to extend this mechanism to support checks at runtime. Static calls can be checked in loader, as proposed in EEP05. Dynamic calls would be way harder to implement and it is questionable do we want to sacrifice performance and simplicity for those checks.

Every critic is very welcome

MarkoMin avatar Jun 15 '23 15:06 MarkoMin