script icon indicating copy to clipboard operation
script copied to clipboard

Get multiple columns

Open conotto opened this issue 3 years ago • 26 comments

Hi, It looks like Column() only supports getting a single column, is there a way simulate the below command ? awk -F '\t' '{ print $5 "|" $6 }'

Thanks

conotto avatar Aug 10 '22 15:08 conotto

Thanks for the report @conotto! Can you add a few details about the problem you're trying to solve, and perhaps the Go code that you'd like to write using this feature?

bitfield avatar Aug 10 '22 15:08 bitfield

Hi, When working in shell its often required to extract and process more than 1 column, for example when parsing ps aux output. Thats where awk comes in, some of its basic functionality includes splitting by separator (-F '\t') option (\t) being the separator. As all functions need to return *Pipe (rather than []string) we would need to have a new delimiter.

FilterColumns(columns []int, separator, delimiter string) *Pipe

This way you could at the very minimum pass a delimiter (as far from everything uses whitespace) and pass slice of columns that you wish to extract

Here are a few examples

Input: alpha,bravo,charlie,delta

-> Filter and output only columns 1 and 3 with delimiter of ":"

Output: alpha:charlie

This is the awk command

echo "alpha,bravo,charlie,delta" | awk -F ',' '{ printf $1 ":" $3 }'

What do you think ?

conotto avatar Aug 11 '22 15:08 conotto

What if the API were something like:

script.Stdin().Col(1, 3).Stdout()

bitfield avatar Aug 11 '22 16:08 bitfield

Well then you would lose the following:

  • Ability to split based on anything other than whitespace.
  • Limited to 2 columns ?
  • You also lose ability to further filter the data in cases where column data contains the separator that is hardcoded in your proposed function.

In that case you may as well leave everything as is, as adding 1 extra hardcoded column would be pointless

conotto avatar Aug 11 '22 17:08 conotto

Well, the use case you suggested was filtering ps aux output, which does use whitespace as a separator. I haven't seen any suggestions for use cases that involve some other separator, but perhaps you can think of one?

Hard-coding exactly two columns would, as you say, be pointless. The idea would be to choose one or more columns to filter on:

script.Exec("ps aux").Column(1, 2, 10).Stdout()

bitfield avatar Aug 12 '22 09:08 bitfield

Let met provide a use case that i intended to use this for: I am trying to parse output of maxscale tool output, which is tab space separated, this is the command:

maxctrl list servers --tsv

It produces the following output

nodeA  11.11.11.11    3311    62      Master, Running 0-1-237864414
nodeB   11.11.11.12    3311    35      Slave, Running  0-1-216126880

My goal is to retrieve 5th and 6th columns "Master, Running" and "0-1-237864414" of each of the lines Then column 5 needs to be split using "," to determine the role and status from "Master, Running" using column 1 and 2 respectively. Column 6 needs to be split using "-" to retrieve the 3rd column "237864414" Once this information is retrieved the goal is to use switch case to determine if the 3 final values "Master, "Running", "237864414" match conditions and do actions based on that.

I understand that this library may not match all of my requirements, i suggested the change as i feel others may benefit from the extended column filtering ability. Besides this use case, there is wide variety of others that i run into on daily basis, for example parsing CSV data. Sometimes the data that you parse contains whitespace in the value itself, which makes it impossible to properly separate the data without replacing it with another delimiter. I have already written the go code to serve my purpose and was merely suggesting a feature.

conotto avatar Aug 12 '22 13:08 conotto

Your suggestion is most welcome! Don't be offended if I ask for more information—actually, that's a sign I'm taking it seriously.

To turn an issue into a proposal is not necessarily straightforward, because it requires a design. That's what I'm hoping you can help me work out. In other words, what code would you like to write using script that achieves what you want?

For example, if you want to pass a separator argument to Column, what would that look like? Or would it be a different method? Exercise your imagination and write the program that would be as simple and clear as you would like it to be, and we'll infer the necessary API from that code.

bitfield avatar Aug 12 '22 14:08 bitfield

No offence taken, i would probably go with something like this. To avoid breaking existing functionality it would make sense to name this function something other that Column

func ColumnFilter(separator, newDelimiter string, columns ...int) *Pipe
``


script.Exec("maxctrl list servers --tsv").ColumnFilter("\t", "|", 1, 2, 10).Stdout()

conotto avatar Aug 12 '22 15:08 conotto

So what would the output be in that example? Something like:

nodeA|11.11.11.11|0-1-237864414
nodeB|11.11.11.12|0-1-216126880

Presumably that's not the final output you actually want, so what else would we need to do here to process this result?

bitfield avatar Aug 12 '22 16:08 bitfield

To think of it, something like below pseudo code would be a great way to cut down on amount of lines (ofcourse at the cost of error handling).

lines := script.Exec("maxctrl list servers --tsv").SplitByLine()        // Split input by lines

/*
   SplitByLine() returns slice of lines
  Eg:
  line0: nodeA  11.11.11.11    3311    62      Master, Running 0-1-237864414
  line1: nodeB   11.11.11.12    3311    35      Slave, Running  0-1-216126880
["nodeA  11.11.11.11    3311    62      Master, Running 0-1-237864414", "nodeB   11.11.11.12    3311    35      Slave, Running  0-1-216126880"]
*/

// Go over each line and split it by "|" separator into columns
for _, line := range lines {
     mainColumns := Echo(line).ColumnFilter("\t", "|", 1, 5, 6).SliceOfStrings()      // Split each line into columns based on "|" separator

// on the first iteration mainColumns now contains ["nodeA", "Master, Running", "0-1-237864414"]

    role := Echo(mainColumns[1]).ColumnFilter(",", "", 1).String()           // Output contains "Master"
    status := Echo(mainColumns[1]).ColumnFilter(",", "", 2).String()       // Output contains "Running"
    transactions := Echo(mainColumns[1]).ColumnFilter("-", "", 3).String()       // Output contains "237864414"

  /*
    Do switch/if logic here based on the above 3 values
  */


}

Food for though.

conotto avatar Aug 12 '22 17:08 conotto

Nice!

SplitByLine already exists, by the way—that's called .Slice().

bitfield avatar Aug 13 '22 08:08 bitfield

Well, the use case you suggested was filtering ps aux output, which does use whitespace as a separator. I haven't seen any suggestions for use cases that involve some other separator, but perhaps you can think of one?

Hard-coding exactly two columns would, as you say, be pointless. The idea would be to choose one or more columns to filter on:

script.Exec("ps aux").Column(1, 2, 10).Stdout()

Isn't data read from a comma separated (CSV) files a clear example of a different separator than whitespace?

Plus, as far as how to specify the separator, I would take a hint from some of the GoLang strings routines, such as Trim which takes a cutset of multiple characters. For example, for some log files, I'd send in "\t :()" as the cut set and get all the columns separated out.

As far as specifying multiple columns, I would add a new function called Columns instead and allow it to take an int slice. so Columns(cutset string, columns []int) []string

tjayrush avatar Sep 22 '22 23:09 tjayrush

Yes, CSV is a good example. If we wanted to parse CSV data with the minimum of user paperwork, we could write something like:

script.File("data.csv").CSV().Column(1).Stdout()

Specifying a cutset is possible, but I have a feeling that splitting on whitespace and commas already covers the vast majority of things that users would want to do. Adding a new API just for unusual edge cases may not be worth the extra complexity.

bitfield avatar Sep 23 '22 08:09 bitfield

I would also make use of that feature (only for whitespace or comma), while trying to use something such as:

🕙 12:42:59 ❯ kubectl get deploy -A
appl-k8s-e2etests1-e1   hello-simple                       1/1     1            1           5h36m
appl-k8s-e2etests2-e1   hello-simple                       1/1     1            1           5h36m
appl-k8s-e2etests2-e1   hello-trident                      1/1     1            1           5h36m
kube-argocd-master      argocd-applicationset-controller   1/1     1            1           2d6h
kube-argocd-master      argocd-dex-server                  1/1     1            1           2d6h

if I could then use something like

ns, deploy := script.Exec("kubectl get deploy --all-namespaces --no-headers").Columns(0,1)

that would be perfect.

at the moment I'm doing .Slice() and then using strings.Fields on each line. not terrible but the above would be a nice-to-have.

off-topic: writing shell scripts with bitfield/script is a lot of fun, thanks for the project @bitfield !

clementnuss avatar Sep 28 '22 10:09 clementnuss

@conotto I have a suggestion for a work-around you could do in the meantime (until a PR is accepted that provides a Columns method or whatever y'all decide to call it). See the tests for examples of how to use it. And to @bitfield I have a suggestion for the possible implementation. I think you'll see how easily this could be translated to a *Pipe method very similar to Column.

I also felt kinda torn about including the new/output delimiter, and feel it's a bit cleaner without it (and just assume a single-space), but I think it came out okay. Using strings.FieldsFunc seemed to me like a natural next step to your Column method implementation (and provides some ideas for further tests to write). You could also easily change this to remove the use of the newDelim string parameter, and/or change the fn func(rune) bool to a []rune or even a cutset string which ranging over will still get you a rune each time to use in strings.FieldsFunc(...) or whatever other tactic y'all want. So there are several possible changes that might be more palatable to y'all as well!

I'm open to any thoughts y'all have and if y'all like this, I'm willing to open a PR as well. Please let me know what you think. I got kinda lazy with only writing the below tests, I'm sure more scenarios and assertions would be good.

workaround

// Columns is a package function that returns a function to be used with `Pipe.FilterScan`.
// Provided fn a function that determines what runes to split on, and variadic int(s),
// it will return multiple, newDelim-separated columns.
func Columns(fn func(rune) bool, newDelim string, cols ...int) func(string, io.Writer) {
	return func(line string, w io.Writer) {
		if len(cols) < 1 {
			return
		}

		var b strings.Builder
		columns := strings.FieldsFunc(line, fn)
		hasWritten := false
		maxSafeIndex := len(columns) - 1
		finalDesiredColumn := len(cols) - 1

		for i, v := range cols {
			isSafeToUseColumn := v <= maxSafeIndex && v > 0
			isFinalDesiredColumn := i == finalDesiredColumn

			if isSafeToUseColumn && !isFinalDesiredColumn {
				b.WriteString(fmt.Sprintf("%s%s", columns[v-1], newDelim))
				hasWritten = true
			} else if isSafeToUseColumn && isFinalDesiredColumn {
				b.WriteString(fmt.Sprintf("%s\n", columns[v-1]))
				hasWritten = true
			}
			if !isSafeToUseColumn && hasWritten {
				_, _ = fmt.Fprintf(w, "%s\n", strings.TrimRight(b.String(), newDelim))
				return
			}
		}
		_, _ = fmt.Fprint(w, b.String())
	}
}

some tests

func TestColumns(t *testing.T) {
	const test1 string = `1,2,3,4
alpha,bravo,charlie,delta`
	const test2 string = `1   2   3   4
alpha    bravo    charlie    delta`

	t.Run("works on CSV data", func(t *testing.T) {
		want := fmt.Sprintf("2 3\nbravo charlie\n")
		fn := func(c rune) bool {
			return c == ','
		}
		got, err := script.
			Echo(test1).
			FilterScan(Columns(fn, " ", 2, 3)).
			String()
		if err != nil {
			t.Errorf("expected nil error but got %v", err)
		}

		if got != want {
			t.Errorf("wanted '%v', but got '%v'", want, got)
		}
	})

	t.Run("CSV data with bad columns", func(t *testing.T) {
		want := fmt.Sprintf("2 3\nbravo charlie\n")
		fn := func(c rune) bool {
			return c == ','
		}
		got, err := script.
			Echo(test1).
			FilterScan(Columns(fn, " ", 2, 3, 5)).
			String()
		if err != nil {
			t.Errorf("expected nil error but got %v", err)
		}

		if got != want {
			t.Errorf("wanted '%v', but got '%v'", want, got)
		}
	})

	t.Run("doesn't explode with no desired column(s)", func(t *testing.T) {
		var want string
		fn := func(c rune) bool {
			return c == ','
		}
		got, err := script.
			Echo(test1).
			FilterScan(Columns(fn, " ")).
			String()
		if err != nil {
			t.Errorf("expected nil error but got %v", err)
		}

		if got != want {
			t.Errorf("wanted '%v', but got '%v'", want, got)
		}

	})

	t.Run("doesn't fail with bad column index", func(t *testing.T) {
		var want string
		fn := func(c rune) bool {
			return c == ','
		}
		got, err := script.
			Echo(test1).
			FilterScan(Columns(fn, " ", 0)).
			String()
		if err != nil {
			t.Errorf("expected nil error but got %v", err)
		}

		if got != want {
			t.Errorf("wanted '%v', but got '%v'", want, got)
		}

	})

	t.Run("works on TSV data", func(t *testing.T) {
		want := fmt.Sprintf("2\t3\nbravo\tcharlie\n")
		fn := func(c rune) bool {
			return unicode.IsSpace(c)
		}
		got, err := script.
			Echo(test2).
			FilterScan(Columns(fn, "\t", 2, 3)).
			String()
		if err != nil {
			t.Errorf("expected nil error but got %v", err)
		}

		if got != want {
			t.Errorf("wanted '%v', but got '%v'", want, got)
		}
	})

	t.Run("TSV data with bad column(s)", func(t *testing.T) {
		want := fmt.Sprintf("2\t3\nbravo\tcharlie\n")
		fn := func(c rune) bool {
			return unicode.IsSpace(c)
		}
		got, err := script.
			Echo(test2).
			FilterScan(Columns(fn, "\t", 2, 3, 6)).
			String()
		if err != nil {
			t.Errorf("expected nil error but got %v", err)
		}

		if got != want {
			t.Errorf("wanted '%v', but got '%v'", want, got)
		}
	})

        t.Run("use fun delimeter", func(t *testing.T) {
		want := fmt.Sprintf("1|3\nalpha|charlie\n")
		fn := func(c rune) bool {
			return unicode.IsSpace(c)
		}
		got, err := script.
			Echo(test2).
			FilterScan(Columns(fn, "|", 1, 3)).
			String()
		if err != nil {
			t.Errorf("expected nil error but got %v", err)
		}

		if got != want {
			t.Errorf("wanted '%v', but got '%v'", want, got)
		}
	})
}

P.S. I'm a huge fan of script! It's awesome! My team has adopted it along with mage to make our scripting in Go so much fun!

andrew-werdna avatar Nov 07 '22 16:11 andrew-werdna

Also, another potential idea, (I don't know if this is of any interest to you whatsoever), but what if you/we made a script-contrib repository, where people could contribute helper functions or things they've implemented with Pipe.Filter(), Pipe.FilterScan(), etc.? It's just a thought, you've been pretty diligent about not letting the scope of script creep or bloat up. It seems kinda overkill now that I think about it more.

andrew-werdna avatar Nov 09 '22 00:11 andrew-werdna

That's a great idea @andrew-werdna! If you'd like to create such a repo, do go ahead. I'm sure there are a lot of useful programs out there that could be contributed.

bitfield avatar Nov 09 '22 09:11 bitfield

@bitfield what do you think of the Columns function implementation above (and the comments/explanation about possible simplifications, etc)?

andrew-werdna avatar Nov 10 '22 09:11 andrew-werdna

I think it's a great thing to include in a contrib repo like the one you suggested!

bitfield avatar Nov 10 '22 09:11 bitfield

@bitfield I have created a contrib repo with just like 2 filters added with tests and whatnot. I'd love your blessing and any advice, opinions, etc. that you would have in any regards. I know the readme needs a lot of work, and I plan on shoring that up soon.

andrew-werdna avatar Nov 10 '22 10:11 andrew-werdna

Great job!

bitfield avatar Nov 10 '22 11:11 bitfield

Do you think that a contrib repo would draw more attention, and be more likely to be used by the community if the repository was yours? i.e. bitfield/script-contrib? I'd happily contribute either way.

andrew-werdna avatar Nov 10 '22 19:11 andrew-werdna

I'm sure the attachment of my name to it would make no kind of difference to anyone—why should it? 😄

bitfield avatar Nov 11 '22 10:11 bitfield

I disagree 😉 if I have to find some script-contrib repo I will start by looking at your username @bitfield

and if someone else maintains such a repo I will (except if that repo has already a good number of stargazers) not necessarily look in details at its content.

but that's just my humble opinion (if other people agree, simply upvote this comment)

clementnuss avatar Nov 11 '22 12:11 clementnuss

yeah I was thinking along the lines of otel-go and otel-go-contrib. I could swear I've seen more examples of this, but I can't think of any at the moment.

andrew-werdna avatar Nov 14 '22 16:11 andrew-werdna

Adding a new API just for unusual edge cases may not be worth the extra complexity.

I'm going to make the argument that the GoLang designers came down on a different side of this issue, and my guess is that they had a much larger group of people in the discussion.

If you wanted to remove complexity, I would remove a special purpose function called Csv which seems not easily extensible. I image there's also already existing a function called Tab (or similar). Both Csv and Tab could be easily implemented as syntactic sugar on top of a more robust Column interface mimicing GoLang with cutsets. That would remove complexity and add functionality at the same time.

tjayrush avatar Nov 14 '22 19:11 tjayrush