tinygo icon indicating copy to clipboard operation
tinygo copied to clipboard

wasm: do not export malloc, calloc, realloc, free

Open aykevl opened this issue 1 year ago • 20 comments

These functions were exported by accident, because the compiler had no way of saying these functions shouldn't be exported. Unfortunately, they can easily be misused and bloat WebAssembly programs.

Not exporting them can be a big code size reduction for small programs. Before:

$ tinygo build -o test.wasm -target=wasi -no-debug -scheduler=none ./testdata/alias.go && ls -l test.wasm
-rwxrwxr-x 1 ayke ayke 2947  8 sep 13:47 test.wasm

After:

$ tinygo build -o test.wasm -target=wasi -no-debug -scheduler=none ./testdata/alias.go && ls -l test.wasm
-rwxrwxr-x 1 ayke ayke 968  8 sep 13:47 test.wasm

That's a 2kB reduction in binary size, because the GC isn't needed anymore.

This change also adds support for using //go:wasm-module to set the module name of an exported function (the default remains env).


I expect this can be a controversial change, because some people have been relying on the exported malloc function. While this function does work, it is not intended to be used like that (it's only to be used for C linked using CGo). Instead, memory should be allocated by exporting a specific function for it. For example:

var allocatedBytes *byte

//export allocateBytes
func allocateBytes(n int) *byte {
    buf := make([]byte, n)
    allocatedBytes = &buf[0]
    return &buf[0]
}

This is a safe way to export memory.

aykevl avatar Sep 08 '22 11:09 aykevl

I think this will break a significant amount of the TinyGo wasm community, so this should sit at least a few days to deal with timezones etc.

codefromthecrypt avatar Sep 08 '22 11:09 codefromthecrypt

Can we re-export them with a compiler flag but turn them off by default or other code patch?I think most people shouldn't need them.

dgryski avatar Sep 08 '22 14:09 dgryski

What's one project this would break? Maybe we could add one to our test corpus.

How about adding a flag to enable/disable malloc export first, and flip the default to disable after next release?

dankegel avatar Sep 08 '22 15:09 dankegel

Yeah, we can keep them in the form of a unsafe_malloc_export build tag for example. That way people get a bit more time to transition to a supported way of allocating memory. @codefromthecrypt would that work for you?

aykevl avatar Sep 08 '22 15:09 aykevl

Do we want to add a file with those build tags in this PR or another one?

dgryski avatar Sep 08 '22 16:09 dgryski

First, I would like to decouple this from me. I don't actually build end wasm, just I respond to many people who are using tinygo to various levels of success. Also, the "malloc" "free" functions have been in use before wasm was a glint in my eye, and they are commonly used in emscripten and other wasm toolchains. Main point is that regardless of my response, these functions are absolutely in use, so how to communicate is very important.

I would suggest the following:

  • some sort of flag so that people don't have to change their code to upgrade
    • maybe look at other toolchains and make an arbitrary, ideally similar decision. -Wl,--export=malloc -Wl,--export=free or -s EXPORTED_FUNCTIONS=_malloc,_free
  • make a library in tinygo org that is tested and obviates this via the recommended approach
    • this is required because copy/paste is error prone and too technical
    • if not here, I can create one in tetratelabs org
  • communicate clearly in the release notes a workaround (ex flag if one exists), a recommended approach (copy paste or include a lib that does the same), and benefits of this break (some people have 2k smaller code, some concrete example of exactly how these functions break .. eg not theoretical, a failing test)

my 2p!

codefromthecrypt avatar Sep 09 '22 02:09 codefromthecrypt

Thanks @codefromthecrypt - in my experience, functions to allocate memory within the GC are essential for being able to pass data in from the host to wasm, so losing them will make code harder to write. I think a short term of adding flags to export the functions to preserve existing code, and a longer term of a library that can be imported to provide implementations reusable among TinyGo programs, could be a good way to make this easier for users.

anuraaga avatar Sep 09 '22 03:09 anuraaga

#3148 looks like it would remove the safety issues in the built-in "malloc", a primary reason to discourage its use. effectively it changes it from unsafe to safe.

So, I suggest we wait on #3148 before making a firm outcome here. If "malloc" becomes safe, I believe and correct me if wrong, the rationale here boils down to not wanting to export built-in functions (even if they are safe).

codefromthecrypt avatar Sep 09 '22 04:09 codefromthecrypt

ps we created a lib which if used can decouple things. since this is integration tested it should be usable regardless of decision here https://github.com/tetratelabs/tinymem/pull/2

codefromthecrypt avatar Sep 09 '22 08:09 codefromthecrypt

I believe and correct me if wrong, the rationale here boils down to not wanting to export built-in functions (even if they are safe).

Correct, I'd like to remove them (by default at least) even if they are made safe. The reason is the reduction in wasm binary size, which many people will appreciate.

aykevl avatar Sep 14 '22 10:09 aykevl

I have made a small change to this PR, so that it is still possible to export malloc. Example:

// #cgo LDFLAGS: --export=malloc --export=free
import "C"

This exports the malloc and free functions, for people who need it. Would that be a usable workaround?

aykevl avatar Sep 14 '22 11:09 aykevl

@aykevl Thanks for progressing.

So if I understand correctly, users won't be able to export via flags to tinygo build, rather add this to their main.go before they build it.

// #cgo LDFLAGS: --export=malloc --export=free
import "C"

Instead of a blank import of something that exports 3rd party version of similar, like

import (
	_ "github.com/tetratelabs/tinymem"
)

The idea being that while this requires a source change, it avoids a 3rd party lib. Avoiding a 3rd party dep will allow code that currently has no dependencies to stay that way without copy/pasta.

Do I get it?

codefromthecrypt avatar Sep 14 '22 11:09 codefromthecrypt

The idea being that while this requires a source change, it avoids a 3rd party lib. Avoiding a 3rd party dep will allow code that currently has no dependencies to stay that way without copy/pasta.

Yes, that sounds correct. (But of course the #cgo line could also be in a 3rd party library). The main reason why I did it the way I did it is because it is a very small and simple change.

aykevl avatar Sep 14 '22 13:09 aykevl

As far as I'm concerned the workaround is helpful, if it is planned to stick around. If you think it will be removed later, than maybe still helpful but should mention that. In any case a "what to do" part in the release notes like written above will help folks.

And especially mention the rationale about the 2k savings. I'll do one last stab to put into context the savings. It might make you change your mind about defaulting to have these unavailable vs making it possible to exclude them instead.


Let's take this cat program which carefully avoids big packages like fmt.

package main

import (
	"os"
)

// main runs cat: concatenate and print files.
//
// Note: main becomes WASI's "_start" function.
func main() {
	// Start at arg[1] because args[0] is the program name.
	for i := 1; i < len(os.Args); i++ {
		bytes, err := os.ReadFile(os.Args[i])
		if err != nil {
			os.Exit(1)
		}

		// Use write to avoid needing to worry about Windows newlines.
		os.Stdout.Write(bytes)
	}
}

2KB is less than 1pct for a default build

$ tinygo build -o cat.wasm -target=wasi cat.go
$ du -k cat.wasm
260	cat.wasm

2KB is less than 4pct for a trimmed build

$ tinygo build -o cat.wasm -scheduler=none --no-debug -target=wasi cat.go
$ du -k cat.wasm
60	cat.wasm

codefromthecrypt avatar Sep 15 '22 00:09 codefromthecrypt

And especially mention the rationale about the 2k savings. I'll do one last stab to put into context the savings. It might make you change your mind about defaulting to have these unavailable vs making it possible to exclude them instead.

It's not just that. It was never intended this way, like a private API that accidentally got exported. I want to make the absolute minimum available in an API so that the maintenance burden is smaller. Admittedly malloc is a rather standard API but it should never have been exposed in the first place.

2KB is less than 4pct for a trimmed build

4% is enormous for a compiler developer. Even a 1% saving is really big to a compiler developer. Most changes to reduce code size are less than that. The thing is that there are many of these little changes and they add up. TinyGo would produce much larger binaries if I wasn't paying attention to all these small savings.

aykevl avatar Sep 15 '22 09:09 aykevl

I can understand where you are coming from. keep in mind that tinygo is for end users, not its developers.

Also, as I mentioned, a normal developer would probably use "fmt" instead of dancing around things, and the default wasm size would be 404KB.

I know you are going to remove this now, I'm mainly trying to provide you a perspective you can consider for the release notes which are for end users, some of whom will break, and some of whom may have megabytes or more wasm.

codefromthecrypt avatar Sep 15 '22 10:09 codefromthecrypt

I can understand where you are coming from. keep in mind that tinygo is for end users, not its developers.

Absolutely. But end users wouldn't be using TinyGo if it didn't produce small binaries.

Also, as I mentioned, a normal developer would probably use "fmt" instead of dancing around things, and the default wasm size would be 404KB.

I guess there are many different kinds of developers and use cases. I try to make TinyGo useful to as many as possible (which also means making malloc and free available for those who need it).

Anyway, I agree this will need some special mention in the release notes so that wasm people will know the next release might break their code.

aykevl avatar Sep 15 '22 10:09 aykevl

Last call for comment for interested WASM/WASI people!

I personally agree with the reasons for this PR. The #cgo LDFLAGS solution does not seem too burdensome for those who have ended up using this feature and need some time to modify their code, or start using tinymem, or whatever is the best option based on their scenario.

deadprogram avatar Sep 20 '22 19:09 deadprogram

@deadprogram and I suppose modifying code is in any case necessary meanwhile, as we have no other way to pass cgo LDFLAGS another way (such as CGO_LDFLAGS env variable). Correct?

codefromthecrypt avatar Sep 21 '22 06:09 codefromthecrypt

@k33g (wasm-university etc) @juntao (wasmedge examples) @knqyf263 (trivy, go-plugin) FYI the tinygo built-in exports for "malloc" and "free" will be removed by default and need a code change to put them back, starting next release.

You may need to change docs or code generators when this happens. Here's a synopsis of two ways to do it.

https://github.com/tinygo-org/tinygo/pull/3142#issuecomment-1246621492

Personally, I think the cgo LDFLAG thing is a bit scary for newcomers to look at, so will change wazero's docs to not suggest doing that. Up to you with your respective audiences!

codefromthecrypt avatar Sep 21 '22 06:09 codefromthecrypt

From what I can tell from this thread and the associated code, the fix for any third party packages that are currently using malloc() would be either:

  • add a file someplace in the third party package that includes the LDFLAGS mentioned above.
  • perform memory allocations in the package using code similar to that mentioned in the original comment for this PR https://github.com/tinygo-org/tinygo/pull/3142#issue-1366164801

The first option would be a "quick fix" if package maintainers need it. The second would be preferable, due to being safer, and also backward compatible.

For end-user applications that are using malloc() directly, the same applies. But if the app itself is not doing this, then the above changes to whatever package they are using should make it invisible to the end-developer.

@aykevl can you please confirm if my summary is correct?

deadprogram avatar Sep 26 '22 08:09 deadprogram

Let's wait a bit longer with this PR since it's a bit more contentious than I expected. The main problem with malloc and free has been fixed with #3148 so that alleviates one of my concerns with it. The other (code size increase) isn't fixed yet though, so I still would like to merge this PR eventually - especially since #3148 made the difference even bigger.

@deadprogram yes this sounds correct.

aykevl avatar Sep 29 '22 14:09 aykevl

Is the plan to merge this after the next release?

dgryski avatar Jan 06 '23 23:01 dgryski

I don't have a plan, but yeah we could perhaps do that.

aykevl avatar Jan 11 '23 22:01 aykevl

With 0.27 right around the corner, it seems like this will get cleaned up and merged shortly thereafter.

dgryski avatar Feb 07 '23 01:02 dgryski

@dgryski yes let's do it then. If it gives any issues, there is plenty of time to look for a solution.

aykevl avatar Feb 07 '23 01:02 aykevl

Hi @aykevl @dgryski just wanted to check, is this still planned to unexport malloc, etc by default? Since it didn't happen for 0.28, would we expect it for 0.29 if so?

anuraaga avatar Jun 14 '23 00:06 anuraaga