datautils icon indicating copy to clipboard operation
datautils copied to clipboard

:hamster: Collection of handy text manipulation tools

datautils logo

datautils

The best toolbox for processing textual data.

Release License Go Report Card


Contents

  • Introduction
  • Installation
  • Tools
  • Usage
    • count
    • norm
    • rows
    • text
    • trim
  • Credits

Introduction

The Data Utilities are a collection of handy text manipulation tools. These tools are supposed to make a data wrangler’s life on the command-line easier.

Much of the functionality can be solved with standard command-line tools (awk, sed, cut, sort, uniq, …), but that would often become tedious. Zealots of the Unix philosophy will probably not use these tools and create a set of sophisticated aliases instead.

On the other hand, some of the tools fix actual problems. The tools use UTF-8 by default. As a consequence, one does not have to deal with the quirks of sort and uniq w.r.t. non-ASCII input.

Installation

go get -v github.com/sfischer13/datautils/...

Tools

These tools are part of the collection:

  • count
  • norm
  • rows
  • text
  • trim

Usage

count

$ echo "a\na\na\nb\nb\nc"
a
a
a
b
b
c
$ echo "a\na\na\nb\nb\nc" | count --keys
3	a
2	b
1	c
$ echo "a\na\na\nb\nb\nc" | count --counts
1	c
2	b
3	a
$ echo "a\na\na\nb\nb\nc" | count --flip
a	3
b	2
c	1
$ echo "a\na\na\nb\nb\nc" | count --threshold 2
3	a
2	b

norm

$ echo "¹²³" | norm --nfc
¹²³
$ echo "¹²³" | norm --nfkc
123

rows

echo "a\nb\nc\nd\ne" | rows --rows 2:4
b
c
d
echo "a\nb\nc\nd\ne" | rows --rows 1,5
a
e

text

$ echo abca | text chars
a
b
c
a
$ echo "This is a test." | text words
This
is
a
test.

trim

$ echo "   abc" | trim --left
abc

Credits

This project is authored and maintained by Stefan Fischer.
The source code is available under the MIT License.
See LICENSE for further details.