BrAPI icon indicating copy to clipboard operation
BrAPI copied to clipboard

wildcard search

Open nickmorales opened this issue 7 years ago • 12 comments

would be useful for germplasm and studies

nickmorales avatar Dec 13 '17 20:12 nickmorales

Is this a wildcard on a single specified field or a wildcard on all fields.

cpommier avatar Feb 07 '18 15:02 cpommier

This issue seems related to #199.

guignonv avatar Mar 06 '18 16:03 guignonv

What about something like this added to the POST search object:

{
	paramOne: ["StRiNg*", "st*ng"],
	paramTwo: ["_ring", "str*_"],
	paramThree: ["StRiNg1", "string2"],

	searchConfig: {
		matchMethods: [
			{
				searchParameter: "paramOne",
				wildcard: true,
				caseSensitive: false,
				wildcardCharacter: "*"
			},
			{
				searchParameter: "paramTwo",
				wildcard: true,
				caseSensitive: true,
				wildcardCharacter: "_"
			},
			{
				searchParameter: "paramThree",
				wildcard: false,
				caseSensitive: true,
				wildcardCharacter: null
			}
		],
		sortMethods: [
			{
				searchParameter: "paramTwo",
				sortPriorety: 1,
				sortAccending: true
			}
		]
	}
}

Adding something like this searchConfig object would allow a lot of flexibility without changing the existing calls too much. Those who do not need this level of flexibility could continue using the existing Search calls with searchConfig : null or not included. I would only add this to calls when needed (germplasm-search, studies-search) to limit the complexity of the server side code.

Another alternative is we keep the existing calls as they are, simple and without any wild card or match method, and peruse a complex-search structure as described here #193. This will be more complex to implement, but much more flexible.

BrapiCoordinatorSelby avatar Jun 21 '18 15:06 BrapiCoordinatorSelby

What about simply supporting regex searches? Are there any cases we would'nt be able to deal with?

GuilhemSempere avatar Jun 22 '18 07:06 GuilhemSempere

yes, Regex is definitely a valid option, though in most cases I would caution against it for performance reasons. In general, regex in SQL is very slow. So if we did want to support regex, I would still want to add an explicit flag in the request to indicate to the server that it should use a REGEXP query. That way, if regex was not explicitly requested, the server could use a much faster version of the search query.

BrapiCoordinatorSelby avatar Jun 22 '18 14:06 BrapiCoordinatorSelby

Following this morning discussion, I propose a slightly different approach. Instead of replacing a simple text search with a regexp search in the specs, how about adding a (GET or POST) parameter called "operator" that would be by default set to a simple text search and implementation that supports other type of operators would provide the list somewhere (to dig... serverinfo? calls call?). As type of operators we could define, I see:

  • "=": full case sensitive match for text and equal for numeric values
  • "!=": the neg of "="
  • "<", "<=", ">", ">=": for numeric values
  • "><": when a numeric value is between two others
  • "i=": full case INsensitive match for text (DEFAULT for text)
  • "!i=": the neg of "i="
  • "contains": when contains the exact given text (case insensitive)
  • "has_word": when contains at least one word of a list of space-separated words
  • "has_all": when contains all the given words of a list of space-separated words in any order
  • "begins": when a text begins with the given input
  • "ends": when a text ends with the given input
  • "regex": when a text matches a given regex
  • "shorter_than" and "longer_than": when a text is shorter or longer that a given length
  • ...with their negative versions.

So, for instance, if you want to find all germplasm which name begins with "pisang", it could be managed by /v2/germplasm/?germplasmName=pisang&operator=begins or /v2/germplasm/?germplasmName=^pisang.*$&operator=regex

guignonv avatar Oct 23 '23 14:10 guignonv

I am thinking about something even simpler. I think just a simple wildcard character would solve >80% of the use cases. We are thinking about really flexible and powerful tools to cover many imaginary scenarios, but there are simple, real, problem scenarios right now that aren't getting solved.

/germplasm?wildCardCharacter=*&germplasmName=exampl* POST /search/germplasm { "wildCardCharacter" : "*", "germplasmNames" : ["exampl*", "*xample", "*xampl*"]}

wildCardCharacter default is "" and that indicates an exact match

That should cover "beginsWith", "endsWith", "contains", and "exact" matches. Easy to add to the spec, easy to implement in most systems, and it is somewhat extendable later if we find a real need for more complex text matching.

BrapiCoordinatorSelby avatar Oct 23 '23 15:10 BrapiCoordinatorSelby

Fair enough. :) Question: how to choose between case sensitive and case insensitive match?

And it does solve the problem for text values but NOT when you want to filter attribute (numeric) values that are above, below or between some other value(s)... :-s

[edit]...and I'm quite sure there should be also a filter needed for dates somewhere!

guignonv avatar Oct 24 '23 11:10 guignonv

The general consensus in the hackathon discussion was to keep the specification simple until there was a concrete use case that couldn't be solved with the existing filters. In this case @jframi is dealing with millions of germplasm and needing to search by name. It is not practical to download a larger set and filter client side. I believe it was a similar situation that opened this issue originally. I also brought up the point that if you need a high level of complexity in your API search/filtering, then perhaps something like a GraphQL API would be a better fit. We are working on BrAPI in GraphQL which provides that level of filtering out-of-the-box, we don't need to recreate it in the RESTful API.

To answer your questions more directly: I would leave it up to the server implementation to decide if they are using case sensitivity or not. Until it becomes an issue, then we can discuss adding to the spec.

Regarding numbers and dates, the POST /search/images endpoint has some examples of using simple Max/Min parameters for searching numbers and RangeStart/RangeEnd parameters for searching date ranges.

"imageTimeStampRangeStart": "2018-01-01T14:47:23Z",
"imageTimeStampRangeEnd": "2018-12-12T23:47:23Z",
"imageWidthMax": 1920,
"imageWidthMin": 1280,

If the need arises, we can add these types of search fields for simple number and date filtering.

BrapiCoordinatorSelby avatar Oct 24 '23 12:10 BrapiCoordinatorSelby

General consensus at BrAPI Hackathon 2024 was :

  • Wild card : "*"
  • Escape character : ""

Gabriel-Besombes avatar Apr 18 '24 11:04 Gabriel-Besombes

thanks, backslash escape character isn't displayed though

cpommier avatar Apr 18 '24 11:04 cpommier

Wild card : "*" Escape character : "\"

cpommier avatar Apr 18 '24 11:04 cpommier