jq
jq copied to clipboard
Add toUpper and toLower methods
Looking through the system, support for a toUpper and toLower might make some folks lives easier. In my case I have some keys that are upper case, some that are mixed case and others that are lower case and I'd like to use a single match on them by being able to use the same case.
Yeah, this would be nice. Generic Unicode case conversion is a complex thing to do, and we don't really want to either write a Unicode library for jq to nor have a run-time dependency on an external library unless we can link statically for the standalone jq executable. There are several options though, and I agree that this is somewhat important. We might well end up having to have a Unicode library in jq.
@pasamio wrote:
I'd like to use a single match on them by being able to use the same case.
If you have access to a recent (github) version of jq, you can use regular expressions with the "ignore case" option, e.g.
$ jq -n '"abC"| test( "abc";"i" )'
true
$ jq -n ' {"abC":1} | keys[] | select( test("abc";"i")) '
"abC"
$ jq -n '{"abC":1} | keys[] | if test("abc";"i") then "abc" else empty end'
"abc"
There's also a regex filter named match
.
:+1: I really need this as well!
@nikolay - ascii_upcase and ascii_downcase have been in "master" on github since Dec 27. Their definitions can also be used in jq 1.4, as explained in the FAQ.
Any plans to implement "upcase" and "downcase" functions with UTF-8 support, something similar to awk's "tolower".
With jq 1.5: echo '"ÁgUA"' | jq '. | ascii_downcase'
Output: "Água"
With GNU Awk 4.1.3: echo '"ÁgUA"' | awk '{print(tolower($0))}'
Output: "água"
This :point_up: would be great!
Currently I have to create multiple jq queries and link them or use linux commands with regular expressions to lower case only the target fields
Waiting for this one.
Started today using jq and found myself needing this.
:+1:
My wish would be to support regexp substitions from perlre: \l \u \L \U \E
I found ascii_downcase
from builtin.jq
to do just this
I found
ascii_downcase
frombuiltin.jq
to do just this
Yes, but as the name suggests, it works only for ASCII characters. For instance, UTF-8 characters of accented letters (e.g. "Á"
or "Ñ"
) in a string, will be simply ignored and returned in uppercase.
The current workaround, as stated in my 2016 comment in this issue, is using awk's tolower
, wich properly supports accented letters.
I hope that something similar will be implemented in jq someday.
how do I use it?
seems that utf8 case folding should be fairly straightforward using utf8proc (https://juliastrings.github.io/utf8proc/doc/). We might give it a go, anyone interested to take a look if we do?
I've implemented this as a couple of built-ins, utf8tolower
and utf8toupper
. It works like so:
> echo '"DŽIDŽA"' | jq '.|utf8tolower'
"džidža"
The code changes are small and easily toggled. Is there any interest from this repo's maintainers (or others) to incorporate these changes? If so pls lmk and I will generate a pull request.
@nicowilliams @gabrielmagno @flaviotordini @eintr @paulochf @LordMike: given that it seems from your comments (years ago by now...) that you'd have an interest in a solution-- any thoughts on the above?
Hey @liquidaty . Well, in terms of functionality, it looks good to me. In terms of the implementation itself, I'm afraid I will not be able to judge, because I'm not familiar to jq's codebase. By the way, maybe you could point us to your repo, so that more people could see it?
@gabrielmagno, anyone else: here's a diff from the latest jq version pulled today: https://github.com/liquidaty/jq/commit/9f0bde5f5a560e021bc05be98906a0e555270f4e
Note that the autoconf includes settings that appear to support building utf8proc, but that part doesn't actually work-- rather, it just looks for utf8proc already being available and uses it if found
Pull request: https://github.com/stedolan/jq/pull/2547