webstatus.dev icon indicating copy to clipboard operation
webstatus.dev copied to clipboard

Create new `id` term to do exact matching

Open atopal opened this issue 1 year ago • 7 comments

It should be possible to have an overview of features by filtering for them with something like "avif, popover, ..."

atopal avatar May 18 '24 01:05 atopal

I think all that's missing is searching by identifier. There's already OR to string them together.

foolip avatar May 18 '24 11:05 foolip

The name filter (with or without the name: prefix) does a partial string search on both the name and the id. (We call the feature identifier featurekey at the database level of our system due to the database tables already having columns called ID)

But we don't make it explicitly clear that the name filter can also search by ID. We can modify the description of that filter.

@foolip is correct that the OR operator can help with this.

avif OR popover

jcscottiii avatar May 18 '24 12:05 jcscottiii

The substring matching means that the matched features could change, for example "avif" could start matching "avif2" if there's ever a future format named as such.

A way to enumerate a list of exact identifiers would solve this, but there's still a problem. How do you maintain such a query and add/remove features? Any links previously shared will always point to that snapshot, after all.

Some kind of named saved queries? Or should we have tags and the ability to filter by them?

foolip avatar May 18 '24 14:05 foolip

https://webstatus.dev/?q=cap is a concrete case of the substring problem. I only want to match https://webstatus.dev/features/cap but 3 other features are included too.

foolip avatar May 21 '24 15:05 foolip

Today I tried to create a view of 100+ "rendering" features and dealing with the problem created by substring matches proved quite hard. Lots of things are substring matches for other things, and I had to edit the query quite a bit and count the number of matches to build confidence in it.

A new id search atom that is always matched exactly would help here I think. Ideally the name atom then matches only on the name, not the identifier.

Supporting something like tag:my-favorite-features would be a nice addition, but we'd need to maintain that data somewhere, and the question of ownership of a tag would arise. It seems best to first ensure that an exact set of features can be pulled out using very long query strings, and thinking about ways to shorten it later.

foolip avatar May 21 '24 21:05 foolip

Ah, the OR operator is case sensitive, that tripped me up.

Let's make this issue about the problem that Philip mentioned instead. Name doing a substring match is great, but yes, let's add another option that gives exact matches.

Re saved searches: That would be possible for logged in users and then you could namespace it "kadirtopal/top-issues" or similar.

atopal avatar May 22 '24 10:05 atopal

Steps:

  1. Modify the FeatureSearch.g4 to add a new term id.
  2. Run make gen -B to regenerate all the generated code
  3. Add a new Identifier, called IdentifierID.
    • Background: When we build our search tree from the query, each node has an Identifier
  4. Open the generated file lib/gen/featuresearch/parser/antlr/featuresearch_visitor.go to find the signature of your new method for your new term. Should be something like VisitID_term
  5. Copy this method and attach it to our implementation of the Visitor in features_search_visitor.go. We will come back to writing code for it in step 7.
  6. In the same file, create a reusable helper called createIDNode. It should simply call the existing createSimpleNode. Look at other createXxx methods.
  7. Back to the VisitID_term method, call your new helper now.
    • Retrospect: Since this particular Visit method is so short, you probably could just call createSimpleNode here.
  8. Now it's time for some tests. Both tests that 1) assert that the query can be parsed (parse tests) and build a search tree (database tests) , and 2) that the database can use the search tree and find the data. Luckily, our database tests already have features inserted. So we can pull IDs from there. Then we can use those IDs in the parse tests to build the expected tree. Then we can use that exact tree in the new database tests. Examine the database tests below. In the Spanner SQL database, we actually call ID, FeatureKey because ID is reserved for the primary key for the row (which is an ID). Remember the FeatureKeys you have available: https://github.com/GoogleChrome/webstatus.dev/blob/684a2857d5acd7c15911510c991d5281414f80b5/lib/gcpspanner/feature_search_test.go#L45-L66
  9. Go to the parse test in features_search_parse_test.go. Copy an existing simple test case. Like this one: https://github.com/GoogleChrome/webstatus.dev/blob/684a2857d5acd7c15911510c991d5281414f80b5/lib/gcpspanner/searchtypes/features_search_parse_test.go#L672-L688. Just change the InputQuery to id:FeatureKeyYouSelected. run the tests by scrolling to the top of the method in vscode and select Run Test. The test should fail. But show you the difference. Fix the difference then re-run the tests to succeed. You will use this fixed tree later on in step X
  10. Now, we need to have the database use the query feature_search_query.go.
    • Create a new filter method attached to FeatureSearchFilterBuilder called idFilter. Take a look at snapshotFilter for inspiration: https://github.com/GoogleChrome/webstatus.dev/blob/684a2857d5acd7c15911510c991d5281414f80b5/lib/gcpspanner/feature_search_query.go#L311-L331
    • Normalize the input to lower case with something like snapshotFilter
    • Create the paramName that will be used in the query and point to the normalized value.
    • Use opStr := searchOperatorToSpannerBinaryOperator(op) to get the operator (since we only do = or !=)
    • build a query string. We can go more into this later on but you can assume that you have access to WebFeatures table via an alias called wf https://github.com/GoogleChrome/webstatus.dev/blob/684a2857d5acd7c15911510c991d5281414f80b5/infra/storage/spanner/migrations/000001.sql#L16-L23. Something along the lines: `fmt.Sprintf("( wf.FeatureKey_Lowercase %s @%s)", opStr, paramName). Use the computed Lowercase.
  11. Now we need to add a database test. Follow this example from the snapshot database test. (you do not need to insert features. And you are already using a search tree using IDs/FeatureKeys from the inserted features.

For reference, take a look at how we added support for the snapshot term in this PR: https://github.com/GoogleChrome/webstatus.dev/pull/601

jcscottiii avatar Oct 04 '24 21:10 jcscottiii

Two more things:

Thing 1

we should add a playwright test that always checks that we can support 50 IDs in one query.

The test should input the query and assert that the dashboard returns 50 features

Thing 2

We need to add the new term to the frontend's vocabulary:

https://github.com/GoogleChrome/webstatus.dev/blob/e2ee4b34698c9882813f59f0f040d0ba727dcd02/frontend/src/static/js/components/webstatus-overview-filters.ts#L61-L134

jcscottiii avatar Oct 16 '24 16:10 jcscottiii

The work for this has been split up into #786 #787 #788

jcscottiii avatar Oct 21 '24 16:10 jcscottiii

https://github.com/GoogleChrome/webstatus.dev/issues/786 https://github.com/GoogleChrome/webstatus.dev/issues/787 https://github.com/GoogleChrome/webstatus.dev/issues/788 have been resolved

KyleJu avatar Nov 22 '24 00:11 KyleJu