OpenMLDB
OpenMLDB copied to clipboard
built-in function: Support `locate(substr, str[, pos])` function
Is your feature request related to a problem? Please describe.
locate(substr, str[, pos])
Returns the position of the first occurrence of substr in str after position pos. The given pos and return value are 1-based.
Examples:
> SELECT locate('bar', 'foobarbar');
4
> SELECT locate('bar', 'foobarbar', 5);
7
Reference:
https://spark.apache.org/docs/2.3.0/api/sql/#locate
Describe the solution you'd like
We do provide a built-in function development guide for you. Please check OpenMLDB Build-In Function Develop Guide or the Chinese version for help.
Please make sure you:
- [ ] Add built-in C++ function in src/udf/udf.h and src/udf/udf.cc
- [ ] Register function to default library from function
void DefaultUdfLibrary::IniStringUdf()
in src/udf/default_udf_library.cc. - [ ] Add related unit test in src/codegen/udf_ir_builder_test.cc
- [ ] Documenting function
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.
Hello, I would like some clarification on the parameter and return types of this function. I've written the function as 1 but I am aware that it is likely something like 2 or 3. Please let me know so I can commit my changes.
- unsigned int locate(std::string substr, std::string str, unsigned int pos =0 ) {..}
- int32_t locate( hybridse::codec::StringRef substr, hybridse::codec::StringRef str,int32_t pos =0 ) {...}
- int32_t locate(string substr,string str,int32_t pos =0 ) {...}
Hi Mouray-Hutchinson, very glad that you would like to help.
Firstly, I would like to remove the SELECT POSITION('bar' IN 'foobarbar')
since it might be conflict with our IN
expression. Maybe we can simply support locate('bar', 'foobarbar')
and locate('bar', 'foobarbar', 5)
for this issue.
Secondly, we had better implement a function looks like int32_t locate( hybridse::codec::StringRef substr, hybridse::codec::StringRef str,int32_t pos =0 )
in udf.cc
Thank you again. Please feel free to reach out to me regarding any questions or updates.
@jingchen2222 two questions
- What should the function return if the sub string is not found? I have it returning -1
- If I used
t32_t locate( hybridse::codec::StringRef substr, hybridse::codec::StringRef str,int32_t pos =0 )
then is the line following correct syntax for within the test? Test File Location: udf_ir_builder_test.cc
CheckUdf<Nullable<StringRef>, Nullable<StringRef>,Nullable<int32_t>>(udf_name, StringRef("all"), StringRef("helloAll"),int32_t(-1) );
CheckUdf<Nullable<StringRef>, Nullable<StringRef>,Nullable<int32_t>>(udf_name, StringRef("abcd"), StringRef("abcdefg"),int32_t(1) );
locate
@Mouray-Hutchinson Sorry for the late reply.
Locate(substr, str)
Returns the position of the first occurrence of substring substr in string str. This function can be regarded as a variant of
Locate(substr, str, position=1)
Locate(substr, str, position)
Locate(substr, str, position)
Returns the position of the first occurrence of substring substr in string str, starting at position pos.
The rules are:
- Returns 0 if substr is not in str.
- Returns NULL if any argument is NULL.
- The string is case-sensitive
Locate will be implemented in a way very similar to Mysql except that our string comparison is case-sensitive. Check https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_locate for more details.
> SELECT LOCATE('bar', 'foobarbar');
-- 4
> SELECT LOCATE('bar', 'foobarbar', 5);
-- 7
-- Returns 0 if substr is not in str.
> SELECT LOCATE('xbar', 'foobar');
-- 0
-- String is case-sensitive
> SELECT LOCATE('all', 'helloAll', 5);
-- 0
-- Returns NULL if any argument is NULL.
> SELECT LOCATE('bar', NULL);
-- NULL
> SELECT LOCATE(NULL,'foobar');
-- NULL
Hi Mouray-Hutchinson, very glad that you would like to help. Firstly, I would like to remove the
SELECT POSITION('bar' IN 'foobarbar')
since it might be conflict with ourIN
expression. Maybe we can simply supportlocate('bar', 'foobarbar')
andlocate('bar', 'foobarbar', 5)
for this issue. Secondly, we had better implement a function looks likeint32_t locate( hybridse::codec::StringRef substr, hybridse::codec::StringRef str,int32_t pos =0 )
in udf.ccThank you again. Please feel free to reach out to me regarding any questions or updates.
@Mouray-Hutchinson
The position should be 1 by default.
Secondly, we had better implement a function looks like int32_t locate( hybridse::codec::StringRef substr, hybridse::codec::StringRef str,int32_t pos =1 )
in udf.cc
hi @Mouray-Hutchinson do you have any progress for this? we are looking forward to your contribution. thanks.
I'd like to tackle it, please assign to me~
I'd like to tackle it, please assign to me~
thank you !!
I tried but failed, if anyone others want to contribute, please pick it up