teraslice date-mate string functions for discussion and data gathering specific to strings

Current data-mate string related functions to help focus on honing in this data type.

String Related

Validation

isString
isEmail
isMACAddress
isURL
isUUID
contains
isAlpha
isASCII
isBase64
isFQDN
isHash
isCountryCode
isISSN
isRFC3339
isLength
isMIMEType
equals
isEmpty
isISDN
isAlphanumeric
isPostalCode

Transforms

toString
toUpperCase
toLowerCase
trim
truncate
decodeBase64
encodeBase64
decodeURL
encodeURL
decodeHex
encodeHex
encodeMD5
encodeSHA
encodeSHA1
parseJSON
extract
replaceRegex
replaceLiteral
trimStart
trimEnd
toCamelCase
toKebabCase
toPascalCase
toSnakeCase
toTitleCase
splitString

Mar 18 '21 21:03 ciorg

related to issues #2553 and #2242

Mar 18 '21 22:03 ciorg

notes on current functions:

replaceRegex and replaceLiteral - consider combining them into one replace function that accepts regex
isISDN needs country context to be accurate, should consider a simpler version, isPhoneLike
contains, maybe should be includes?

Examples below are meant to show inputs and types not what the final function will look like

functions to add:

validations

isByteSize

check if the string's byte size falls within a range

  isByteLength(input: string, min: number, max:number): boolean

isHexadecimal

check if the string is a hexadecimal number.

  isHexadecimal(input: string): boolean

isIMEI

check if the string is a valid IMEI number. Format option to accept a hyphen-formated imei.

  isIMEI(input: string, format_options?: string): boolean

isIn

check if string is in an array of values

  isIn(input: string, value: string[]): boolean

isNumeric

check if the string is a valid number in string form, useful before using toNumber

  isNumeric(input: string): boolean

isPort

check if string is a valid port number, this can be number or a string

  isPort(input: string | number): boolean

isSemanticVersion

check if string is a Semantic Versioning Specification

  isSemVer(input: string): boolean

matches

check if string matches a regex

  matches(input: string, regex: string): boolean

endswith

checks if string ends with a substring, option for string length to use. If not Length default the string.length

  endsWith(input: string, length: number): boolean

transforms

concat

combines a list of comma separated strings. Not an array of strings, that functionality would be accommodated by a separate function for arrays. Should also consider option to add delineator character.

 concat(...input: string[]): string

reverse

Returns string with the characters in reverse order.

  reverse(input: string): string;

pad, lpad, rpad?

pads string to size of characters. Options to specify left or right, or have 2 separate functions - lpad, rpad.

  pad(input: string, dir: left | right, size: number): string

insert

Insert substring at specified position

  insert(input: string, subsr: string, position: number): string

Metadata Functions

length

returns the length of the string,

  length(input: string): number

octetLength, size, bytes?

returns the number of bytes in the input string

  octectLength(input: string): number

position

start position of substring or character in a string option for the first instance or nth instance or last instance of sub string.

  position(input: string, substring: string, instance: number | last): number

Mar 19 '21 23:03 ciorg

Note: Next function to implement:

concat (inside field)
join (multiple fields)
extract
extractAll => returns array, multiple matches
splitString
replace
reverse

Apr 26 '21 20:04 jsnoble

So I have some feedback on isLength it seems like it is impossible to check the length of string inside an array since if an array is given, it will match against the array length. We may want to consider adding an isArrayOfSize or something that allows you to validate the array length.

Apr 29 '21 18:04 peterdemartini

We might to consider adding icontains for an case-insensitive match

Apr 30 '21 14:04 peterdemartini

I am thinking we could shorten some of the string functions that have Case at the end, it doesn't seem common to include that other query languages and we definitely want to keep the names as short as possible without losing meaning.

toLowerCase => toLower toUpperCase => toUpper toCamelCase => toCamel toSnakeCase => toSnake toTitleCase => toTitle toPascalCase => toPascal toKebabCase => toKebab

Apr 30 '21 14:04 peterdemartini

Also we have a lot of encode[Algorithm] / decode[Algorithm] functions, this doesn't seem to be a common practice, I personally think that it easier to have a encode and decode that takes an argument algo: 'url'|'md5'|'base64'|'sha1'|'sha256'|...

Apr 30 '21 14:04 peterdemartini

A couple other functions that could be shortened:

splitString -> split
isPostalCode -> isPostal

Apr 30 '21 14:04 peterdemartini

that may be possible, but to me I would think its kinda confusing if you saw "toSnake", "toKebab" or "toTitle". What would that mean? What does it do ? That could be a lot of things, I think keeping case helps explain/describe what the intent and purpose of the function is.

As for the encode/decode, I kinda agree that it expands it out a bit and it would be easier to just have one, but the reason why we did that is because Kimbro prefers having directories with minimal or no configuration if possible for users. You can use the correct name and its correctly configured for you so you don't have to futz around with it.

Apr 30 '21 15:04 jsnoble

Ya I agree, the toSnake, toKebab, toPascal and toTitle are kind of odd, but it is important to keep them short

Apr 30 '21 15:04 peterdemartini

As for encode and decode, I think that there are enough different algorithms that this because easily justified, we should discuss with @kstaken

Apr 30 '21 15:04 peterdemartini

Also I want to change my mind about using camel case for the function names, every other data processing language uses snake case and I think it is better and is easier to read especially with the heavy use of acronyms

Apr 30 '21 15:04 peterdemartini

So to sum up our discussion:

[x] The only function named is splitString to split
[ ] We should add a encode(algo)/decode(algo) but keep the existing ones
[x] The casing of the functions will remain in camel case (but they should be considered case insensitive - so no duplicate functions with varying cases)

Apr 30 '21 19:04 peterdemartini

teraslice teraslice copied to clipboard

date-mate string functions for discussion and data gathering specific to strings

String Related

Validation

Transforms

notes on current functions:

functions to add:

validations

isByteSize

isHexadecimal

isIMEI

isIn

isNumeric

isPort

isSemanticVersion

matches

endswith

transforms

concat

reverse

pad, lpad, rpad?

insert

Metadata Functions

length

octetLength, size, bytes?

position

teraslice
teraslice copied to clipboard