go-coreutils icon indicating copy to clipboard operation
go-coreutils copied to clipboard

Make tools importable

Open mvdan opened this issue 7 years ago • 24 comments

Hi Eric! I am developing a shell package - see https://github.com/mvdan/sh.

One of its components is an interpreter. That means I have to implement the shell builtins like echo and cd. One of the big wins of that library is that Go packages that used to need bash to be installed can simply drop that dependency, and use the shell package as a replacement, statically linked into their binary.

However, that breaks down quite easily on systems that don't have coreutils installed. Lots of shell scripts out in the wild depend on coreutils programs like cat, rm and wc. This is why I opened https://github.com/mvdan/sh/issues/93 - to add them to the interpreter as a sort of second layer of builtins.

However, as you probably very well know, adding even just some of them is a ton of work. Which is why I've been looking around for implementations of coreutils.

I could use upstream or the popular implementation in Rust, but that would mean somehow bundling the binaries into the final binary. Something nasty like including them at compile-time as assets and unpacking them into the filesystem at run-time.

But that's not the case with Go, since I can simply import Go packages. Then, the only roadblock that I see is that your tools (nice job, by the way!) are not importable - they are all main packages.

Have you given thought to adding a common interface for all the tools? For example, similar to what os/exec does:

type Ctx struct {
        Dir    string
        GetEnv func(string) string
        Stdin  io.Reader
        Stdout io.Writer
        Stderr io.Writer
}

func Run(c Ctx, name string, args ...string) error

Then one could do something like coreutils.Run(Ctx{...}, "wc", "somefile").

If you have any input, or would like any help to implement this, do let me know.

mvdan avatar Jan 12 '18 11:01 mvdan

Yes I have. I've started to turn some of them into libraries, but I've been more focused on my decimal library as of late. I plan to spend more time on this library once v3.0 of my decimal package drops, which should be whenever trig functions are added.

If you'd like to help in any way you're more than welcome. I'm down to finally make this library useful and help you out!

ericlagergren avatar Jan 12 '18 17:01 ericlagergren

Great to hear that! I won't submit a PR right away, as this would require quite a bit of design and refactoring, and I'm not familiar with this codebase. And it would likely save everyone time if you have a look at it first.

When you start working on this or have a design/prototype, do let me know and I'll be happy to help - be it reviews, testing, or coding.

mvdan avatar Jan 12 '18 17:01 mvdan

Sounds good. I might fiddle around with it a bit today. If you don't hear from me in a week or so, feel free to ping me. I don't mind being bothered. I'm glad somebody's getting use of this library!

ericlagergren avatar Jan 12 '18 18:01 ericlagergren

So, I spent a little while and sketched out an implementation using wc:

Example:

// +build ignore

package main

import (
	"os"

	"github.com/ericlagergren/go-coreutils/coreutils"

	_ "github.com/ericlagergren/go-coreutils/wc"
)

func main() {
	ctx := coreutils.Ctx{
		Stdin:  os.Stdin,
		Stdout: os.Stdout,
		Stderr: os.Stderr,
	}
	coreutils.Run(ctx, "wc", "-l", "cmd.go")
}

ericlagergren avatar Jan 13 '18 05:01 ericlagergren

Did you forget to commit the coreutils package? I'm also not a terrible fan of the coreutils/coreutils path :) Perhaps you could simply use the root package, or do something else like coreutils/exec.

I would also need Dir in the context struct, similar to what's in the os/exec package. Otherwise, the current dir from the process is forced, which is no good for my interpreter.

Otherwise looks good!

mvdan avatar Jan 13 '18 15:01 mvdan

Maybe, I was trying to watch Black Mirror at the same time. ¯_(ツ)_/¯

I was thinking of an API kinda like sql.Driver where each command registers itself. But I wasn’t quite sure what it should register.

And yeah, those bits (like the stuttering path) are easier to sort out. I just wanted to see how it’d work and if it was similar to what you were thinking. Le sam. 13 janv. 2018 à 07:14, Daniel Martí [email protected] a écrit :

Did you forget to commit the coreutils package? I'm also not a terrible fan of the coreutils/coreutils path :) Perhaps you could simply use the root package, or do something else like coreutils/exec.

I would also need Dir in the context struct, similar to what's in the os/exec package. Otherwise, the current dir from the process is forced, which is no good for my interpreter.

Otherwise looks good!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ericlagergren/go-coreutils/issues/114#issuecomment-357442546, or mute the thread https://github.com/notifications/unsubscribe-auth/AFnwZ9zKGF9ulmXgadYYtCPxKNIMoisCks5tKMhqgaJpZM4RcIvK .

ericlagergren avatar Jan 13 '18 20:01 ericlagergren

Yes, this is similar to what I was thinking. Registering the commands sounds fine. Ping me when there's a working version I can test out :)

mvdan avatar Jan 15 '18 11:01 mvdan

Ok, here's what I meant to commit the other day: https://github.com/ericlagergren/go-coreutils/commit/8b35c72a40961059a54adb5bf5db67e5ff457ea8

ericlagergren avatar Jan 15 '18 16:01 ericlagergren

Trying it out now, getting this build error on linux/amd64:

# github.com/ericlagergren/go-coreutils/wc/internal/sys
../../../../land/src/github.com/ericlagergren/go-coreutils/wc/internal/sys/fadv_unix.go:7:22: Fadvise redeclared in this block
        previous declaration at ../../../../land/src/github.com/ericlagergren/go-coreutils/wc/internal/sys/fadv.go:5:21

mvdan avatar Jan 15 '18 17:01 mvdan

Oh. Just a goofed up build tag inside wc/internal/sys/fadv.go It should be a comma, not a space. Fadv isn't a requirement, anyway. Just theoretically speeds up reading a file by letting the kernel know the desired read pattern.

ericlagergren avatar Jan 15 '18 18:01 ericlagergren

Thanks, now it builds. It behaves differently from GNU wc, though. For example, wc -c somefile gives \t<number>\n instead of just <number>\n. And prog | wc gives -\n instead of \t<number>\t<number>\t<number>\n.

Do you happen to have tests that check input/output of your implementations versus GNU's?

mvdan avatar Jan 15 '18 18:01 mvdan

Also, if you have more time, here's another suggestion to add to the common context - a context.Context. This has multiple advantages, such as setting a timeout or being able to cancel. For most programs that won't be very useful, but imagine sleep, cp, or dd.

mvdan avatar Jan 15 '18 18:01 mvdan

It behaves differently from GNU wc, though.

It does? What version of coreutils are you running? Mine's identical with coreutils 8.29.

$ go run m.go
25986317 /Users/ericlagergren/out2.s
0:1 /tmp $ gwc -c /Users/ericlagergren/out2.s
25986317 /Users/ericlagergren/out2.s
0:1 /tmp $ go run m.go > go.txt; gwc -c /Users/ericlagergren/out2.s > gnu.txt; diff go.txt gnu.txt
0:1 /tmp $

Do you happen to have tests that check input/output of your implementations versus GNU's?

For some, yeah. wc does.

I like the context.Context idea.

ericlagergren avatar Jan 15 '18 18:01 ericlagergren

Simpler example:

$ wc --version
wc (GNU coreutils) 8.28
$ wc /dev/null
      0       0       0 /dev/null
$ cat /dev/null | wc
      0       0       0
$ cat /dev/null | wc -c
0

Unless I got something very wrong in my prototype, your implementation seems to always include the filename (even if it reads from stdin) and when given no flags, it seems to not print those three numbers. That's what I meant by the examples above.

mvdan avatar Jan 15 '18 18:01 mvdan

Gotcha. One of the goals of this project is to have it be byte-for-byte exact with GNU, but sometimes there are good reasons for it not to be. For example, coreutils is meant to run on VAX and stuff, so there's lots of weird edge-case code and sometimes they go from A -> B -> C -> D to do something that Go (because it can abstract more and doesn't need to support machines from the '80s) can do simply by going from A -> D, if that makes sense.

For example, GNU wc uses 7 spaces minimum for all printing, unless it can't stat the input (i.e., it's not regular file). Then it just dumps it with 0 spaces.

It should be easy enough to make byte-for-byte perfect.

ericlagergren avatar Jan 15 '18 19:01 ericlagergren

Thanks - your recent changes make sense. Now my tests almost pass - the only problem is what when reading from stdin it still prints a trailing space, like wc -c <somefile prints 8 \n. Other than that, all tests should now pass :)

mvdan avatar Jan 15 '18 20:01 mvdan

I guess writeCounts will have to wrap its final call to printf inside an fname != “” conditional, then. Basically there’s two things that need to be done: 1) utils need to be converted from packages to libraries, and 2) any unwritten utils need to be written. Some are easier than others, it’s less daunting than it seems. Perhaps the most annoying part is synthesizing GNU’s source code. I’m fine with doing either.

On Jan 15, 2018, at 12:19 PM, Daniel Martí [email protected] wrote:

Thanks - your recent changes make sense. Now my tests almost pass - the only problem is what when reading from stdin it still prints a trailing space, like wc -c <somefile prints 8 \n. Other than that, all tests should now pass :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

ericlagergren avatar Jan 15 '18 20:01 ericlagergren

Sounds good. Note that I absolutely don't need all the tools at once. In particular, the original issue was just about some of the common ones. This will act as an overlay on top of a real os/exec call, so on most environments coreutils will be installed and available anyway.

Even if only one or a few tools are importable as libraries, that's plenty for the interpreter to start using them.

mvdan avatar Jan 15 '18 20:01 mvdan

okay. Well, let me know which ones and what you prefer to do, then we can go from there. Fwiw, some of my tools provide much better performance than GNU’s do :-) For instance (on my machine, at least!) wc -lmcwL on a 25MB assembly file took 7 seconds for GNU, but something like 0.4 seconds for Go. (I think that benchmark has something to do with the default locale, though.)

On Jan 15, 2018, at 12:32 PM, Daniel Martí [email protected] wrote:

Sounds good. Note that I absolutely don't need all the tools at once. In particular, the original issue was just about some of the common ones. This will act as an overlay on top of a real os/exec call, so on most environments coreutils will be installed and available anyway.

Even if only one or a few tools are importable as libraries, that's plenty for the interpreter to start using them.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

ericlagergren avatar Jan 15 '18 20:01 ericlagergren

Basically any that would be used frequently in shell scripts - rm, cp, mv, mkdir, ls, touch, chmod are perhaps the most common ones.

mvdan avatar Jan 15 '18 20:01 mvdan

  • [ ] rm
  • [ ] cp
  • [ ] mv
  • [ ] mkdir
  • [ ] ls
  • [ ] touch
  • [ ] chmod
  • [x] wc

ericlagergren avatar Jan 15 '18 20:01 ericlagergren

For those of you who saw this thread, I'm trying to coordinate with a different project now :) https://github.com/u-root/u-root/issues/2527

mvdan avatar Oct 28 '22 15:10 mvdan

Sorry :)

ericlagergren avatar Oct 28 '22 15:10 ericlagergren

Certainly not trying to dig up old stuff or put blame - I also have some semi-abandoned projects due to lack of free time and energy :) Just want to point others who might still be interested towards more recent developments.

mvdan avatar Oct 28 '22 15:10 mvdan