teraslice icon indicating copy to clipboard operation
teraslice copied to clipboard

date-mate string functions for discussion and data gathering specific to strings

Open ciorg opened this issue 3 years ago • 13 comments

Current data-mate string related functions to help focus on honing in this data type.

String Related

Validation

  • isString
  • isEmail
  • isMACAddress
  • isURL
  • isUUID
  • contains
  • isAlpha
  • isASCII
  • isBase64
  • isFQDN
  • isHash
  • isCountryCode
  • isISSN
  • isRFC3339
  • isLength
  • isMIMEType
  • equals
  • isEmpty
  • isISDN
  • isAlphanumeric
  • isPostalCode

Transforms

  • toString
  • toUpperCase
  • toLowerCase
  • trim
  • truncate
  • decodeBase64
  • encodeBase64
  • decodeURL
  • encodeURL
  • decodeHex
  • encodeHex
  • encodeMD5
  • encodeSHA
  • encodeSHA1
  • parseJSON
  • extract
  • replaceRegex
  • replaceLiteral
  • trimStart
  • trimEnd
  • toCamelCase
  • toKebabCase
  • toPascalCase
  • toSnakeCase
  • toTitleCase
  • splitString

ciorg avatar Mar 18 '21 21:03 ciorg

related to issues #2553 and #2242

ciorg avatar Mar 18 '21 22:03 ciorg

notes on current functions:

  • replaceRegex and replaceLiteral - consider combining them into one replace function that accepts regex
  • isISDN needs country context to be accurate, should consider a simpler version, isPhoneLike
  • contains, maybe should be includes?

Examples below are meant to show inputs and types not what the final function will look like

functions to add:

validations

isByteSize

check if the string's byte size falls within a range

  isByteLength(input: string, min: number, max:number): boolean

isHexadecimal

check if the string is a hexadecimal number.

  isHexadecimal(input: string): boolean

isIMEI

check if the string is a valid IMEI number. Format option to accept a hyphen-formated imei.

  isIMEI(input: string, format_options?: string): boolean

isIn

check if string is in an array of values

  isIn(input: string, value: string[]): boolean

isNumeric

check if the string is a valid number in string form, useful before using toNumber

  isNumeric(input: string): boolean

isPort

check if string is a valid port number, this can be number or a string

  isPort(input: string | number): boolean

isSemanticVersion

check if string is a Semantic Versioning Specification

  isSemVer(input: string): boolean

matches

check if string matches a regex

  matches(input: string, regex: string): boolean

endswith

checks if string ends with a substring, option for string length to use. If not Length default the string.length

  endsWith(input: string, length: number): boolean

transforms

concat

combines a list of comma separated strings. Not an array of strings, that functionality would be accommodated by a separate function for arrays. Should also consider option to add delineator character.

 concat(...input: string[]): string

reverse

Returns string with the characters in reverse order.

  reverse(input: string): string;

pad, lpad, rpad?

pads string to size of characters. Options to specify left or right, or have 2 separate functions - lpad, rpad.

  pad(input: string, dir: left | right, size: number): string

insert

Insert substring at specified position

  insert(input: string, subsr: string, position: number): string

Metadata Functions

length

returns the length of the string,

  length(input: string): number

octetLength, size, bytes?

returns the number of bytes in the input string

  octectLength(input: string): number

position

start position of substring or character in a string option for the first instance or nth instance or last instance of sub string.

  position(input: string, substring: string, instance: number | last): number

ciorg avatar Mar 19 '21 23:03 ciorg

Note: Next function to implement:

  • concat (inside field)
  • join (multiple fields)
  • extract
  • extractAll => returns array, multiple matches
  • splitString
  • replace
  • reverse

jsnoble avatar Apr 26 '21 20:04 jsnoble

So I have some feedback on isLength it seems like it is impossible to check the length of string inside an array since if an array is given, it will match against the array length. We may want to consider adding an isArrayOfSize or something that allows you to validate the array length.

peterdemartini avatar Apr 29 '21 18:04 peterdemartini

We might to consider adding icontains for an case-insensitive match

peterdemartini avatar Apr 30 '21 14:04 peterdemartini

I am thinking we could shorten some of the string functions that have Case at the end, it doesn't seem common to include that other query languages and we definitely want to keep the names as short as possible without losing meaning.

toLowerCase => toLower toUpperCase => toUpper toCamelCase => toCamel toSnakeCase => toSnake toTitleCase => toTitle toPascalCase => toPascal toKebabCase => toKebab

peterdemartini avatar Apr 30 '21 14:04 peterdemartini

Also we have a lot of encode[Algorithm] / decode[Algorithm] functions, this doesn't seem to be a common practice, I personally think that it easier to have a encode and decode that takes an argument algo: 'url'|'md5'|'base64'|'sha1'|'sha256'|...

peterdemartini avatar Apr 30 '21 14:04 peterdemartini

A couple other functions that could be shortened:

  • splitString -> split
  • isPostalCode -> isPostal

peterdemartini avatar Apr 30 '21 14:04 peterdemartini

that may be possible, but to me I would think its kinda confusing if you saw "toSnake", "toKebab" or "toTitle". What would that mean? What does it do ? That could be a lot of things, I think keeping case helps explain/describe what the intent and purpose of the function is.

As for the encode/decode, I kinda agree that it expands it out a bit and it would be easier to just have one, but the reason why we did that is because Kimbro prefers having directories with minimal or no configuration if possible for users. You can use the correct name and its correctly configured for you so you don't have to futz around with it.

jsnoble avatar Apr 30 '21 15:04 jsnoble

Ya I agree, the toSnake, toKebab, toPascal and toTitle are kind of odd, but it is important to keep them short

peterdemartini avatar Apr 30 '21 15:04 peterdemartini

As for encode and decode, I think that there are enough different algorithms that this because easily justified, we should discuss with @kstaken

peterdemartini avatar Apr 30 '21 15:04 peterdemartini

Also I want to change my mind about using camel case for the function names, every other data processing language uses snake case and I think it is better and is easier to read especially with the heavy use of acronyms

peterdemartini avatar Apr 30 '21 15:04 peterdemartini

So to sum up our discussion:

  • [x] The only function named is splitString to split
  • [ ] We should add a encode(algo)/decode(algo) but keep the existing ones
  • [x] The casing of the functions will remain in camel case (but they should be considered case insensitive - so no duplicate functions with varying cases)

peterdemartini avatar Apr 30 '21 19:04 peterdemartini