teraslice
teraslice copied to clipboard
date-mate string functions for discussion and data gathering specific to strings
Current data-mate string related functions to help focus on honing in this data type.
String Related
Validation
- isString
- isEmail
- isMACAddress
- isURL
- isUUID
- contains
- isAlpha
- isASCII
- isBase64
- isFQDN
- isHash
- isCountryCode
- isISSN
- isRFC3339
- isLength
- isMIMEType
- equals
- isEmpty
- isISDN
- isAlphanumeric
- isPostalCode
Transforms
- toString
- toUpperCase
- toLowerCase
- trim
- truncate
- decodeBase64
- encodeBase64
- decodeURL
- encodeURL
- decodeHex
- encodeHex
- encodeMD5
- encodeSHA
- encodeSHA1
- parseJSON
- extract
- replaceRegex
- replaceLiteral
- trimStart
- trimEnd
- toCamelCase
- toKebabCase
- toPascalCase
- toSnakeCase
- toTitleCase
- splitString
related to issues #2553 and #2242
notes on current functions:
- replaceRegex and replaceLiteral - consider combining them into one replace function that accepts regex
- isISDN needs country context to be accurate, should consider a simpler version, isPhoneLike
- contains, maybe should be includes?
Examples below are meant to show inputs and types not what the final function will look like
functions to add:
validations
isByteSize
check if the string's byte size falls within a range
isByteLength(input: string, min: number, max:number): boolean
isHexadecimal
check if the string is a hexadecimal number.
isHexadecimal(input: string): boolean
isIMEI
check if the string is a valid IMEI number. Format option to accept a hyphen-formated imei.
isIMEI(input: string, format_options?: string): boolean
isIn
check if string is in an array of values
isIn(input: string, value: string[]): boolean
isNumeric
check if the string is a valid number in string form, useful before using toNumber
isNumeric(input: string): boolean
isPort
check if string is a valid port number, this can be number or a string
isPort(input: string | number): boolean
isSemanticVersion
check if string is a Semantic Versioning Specification
isSemVer(input: string): boolean
matches
check if string matches a regex
matches(input: string, regex: string): boolean
endswith
checks if string ends with a substring, option for string length to use. If not Length default the string.length
endsWith(input: string, length: number): boolean
transforms
concat
combines a list of comma separated strings. Not an array of strings, that functionality would be accommodated by a separate function for arrays. Should also consider option to add delineator character.
concat(...input: string[]): string
reverse
Returns string with the characters in reverse order.
reverse(input: string): string;
pad, lpad, rpad?
pads string to size of characters. Options to specify left or right, or have 2 separate functions - lpad, rpad.
pad(input: string, dir: left | right, size: number): string
insert
Insert substring at specified position
insert(input: string, subsr: string, position: number): string
Metadata Functions
length
returns the length of the string,
length(input: string): number
octetLength, size, bytes?
returns the number of bytes in the input string
octectLength(input: string): number
position
start position of substring or character in a string option for the first instance or nth instance or last instance of sub string.
position(input: string, substring: string, instance: number | last): number
Note: Next function to implement:
- concat (inside field)
- join (multiple fields)
- extract
- extractAll => returns array, multiple matches
- splitString
- replace
- reverse
So I have some feedback on isLength
it seems like it is impossible to check the length of string inside an array since if an array is given, it will match against the array length. We may want to consider adding an isArrayOfSize
or something that allows you to validate the array length.
We might to consider adding icontains
for an case-insensitive match
I am thinking we could shorten some of the string functions that have Case
at the end, it doesn't seem common to include that other query languages and we definitely want to keep the names as short as possible without losing meaning.
toLowerCase
=> toLower
toUpperCase
=> toUpper
toCamelCase
=> toCamel
toSnakeCase
=> toSnake
toTitleCase
=> toTitle
toPascalCase
=> toPascal
toKebabCase
=> toKebab
Also we have a lot of encode[Algorithm]
/ decode[Algorithm]
functions, this doesn't seem to be a common practice, I personally think that it easier to have a encode
and decode
that takes an argument algo: 'url'|'md5'|'base64'|'sha1'|'sha256'|...
A couple other functions that could be shortened:
-
splitString
->split
-
isPostalCode
->isPostal
that may be possible, but to me I would think its kinda confusing if you saw "toSnake", "toKebab" or "toTitle". What would that mean? What does it do ? That could be a lot of things, I think keeping case helps explain/describe what the intent and purpose of the function is.
As for the encode/decode, I kinda agree that it expands it out a bit and it would be easier to just have one, but the reason why we did that is because Kimbro prefers having directories with minimal or no configuration if possible for users. You can use the correct name and its correctly configured for you so you don't have to futz around with it.
Ya I agree, the toSnake
, toKebab
, toPascal
and toTitle
are kind of odd, but it is important to keep them short
As for encode and decode, I think that there are enough different algorithms that this because easily justified, we should discuss with @kstaken
Also I want to change my mind about using camel case for the function names, every other data processing language uses snake case and I think it is better and is easier to read especially with the heavy use of acronyms
So to sum up our discussion:
- [x] The only function named is
splitString
tosplit
- [ ] We should add a
encode(algo)
/decode(algo)
but keep the existing ones - [x] The casing of the functions will remain in camel case (but they should be considered case insensitive - so no duplicate functions with varying cases)