splita/splitax error when field contains a single value
I have a CSV that has a field EAN, this is either a single value:
7622210953278
or a concatenated string of values:
"7622210953278#7622210953279#7622281623271"
When I split the field, it works when it's a case where there's multiple values, otherwise it ends up replacing the field with (error). I'm converting this CSV to JSON, for what it's worth.
The command I'm running is:
mlr --icsv --ojson --from products.csv put '$EAN = splita($EAN, "#")' then cut -f SKU,TITLE,LONG_DESC,EAN
Which outputs something like this (albeit longer):
{
"SKU": 0000000,
"LONG_DESC": "some description",
"EAN": (error),
"TITLE": "some title",
},
{
"SKU": 0000000,
"LONG_DESC": "some description",
"EAN": [5000159372343, 5000159470490],
"TITLE": "some title",
},
{
"SKU": 0000000,
"LONG_DESC": "some description",
"EAN": (error),
"TITLE": "some title",
},
{
"SKU": 0000000,
"LONG_DESC": "some description",
"EAN": (error),
"TITLE": "some title",
},
I've only just come across miller, so I may be missing something obvious, I've tried with both splita and splitax. Any advice or is this not something I can work around?
As soon as I finished typing this I realised a possible solution and turns it out it worked:
mlr --icsv --ojson --from products.csv put '$EAN = string($EAN)' then put '$EAN = splita($EAN, "#")' then cut -f SKU,TITLE,LONG_DESC,EAN | jq
Apologies for the short-lived issue, feel free to close!
Yes casting to string works since
7622210953278
type-infers as int but
7622210953278#7622210953279#7622281623271
type-infers to string.
~And, splita returns error when given a string.~
[Typo found 2024-08-25]: And, splita returns error when given a string.
Maybe splita ought to do to-string for the user, even when not asked to do so ...
I've not been super-consistent with this TBH.
On the one hand I never ever wanted Miller to be as loosey-goosey with string vs int as, say, JavaScript or PHP -- I always wanted "2"+3 to be error not 5. So most of the time I prefer to return error rather than doing implicit type-casts.
On the other hand, I do some implicit type-casts -- the dot operator for example, since I felt there was nothing else meaningful for . to do so I might as well stringify all inputs. So "3" . 4 is "34". Also substr will stringify its first argument: substr1(123456789, 3, 5) gives "345".
The split functions should probably do the same ...
The splita and splitax docs both say that they split a string into array, I think it could be fair to assume as a user that the argument would be converted to a string.
I'm happy to take a stab and raise a PR for this in the next few days, if you'd like?
The splita and splitax docs both say that they split a string into array, I think it could be fair to assume as a user that the argument would be converted to a string
Cool!
Yes, happy to take a PR if you want! 😎