solr
solr copied to clipboard
User Behavior Insights implementation for Apache Solr
Description
I am working with other folks, especially Stavros Macrakis ([email protected]), to come up with a solution for understanding what users are doing in response to search results. We have great visibility and understanding of an incoming query, what we do with it, and then what docs are sent back. We do NOT have a way of tying that search to then what does the user do next, and if the following query is connected to the original one.
Many teams lean on GA or Snowplow or custom code for tracking click through, add to cart, etc as signals, but nothing that is drop dead simple to use and open source.
Solution
User Behavior Insights is a shared schema for tracking search related activities. There is a basic implementation for OpenSearch and this is a version for Apache Solr.
Tasks to be done:
- [x] Demonstrate providing a
.exprfile and using it to write to another Solr collection. - [ ] Look at performance implications of the every query generates a streaming expressoin.
- [ ] Check we only record on the main node, not the replicas when sharding.
- [ ] How can I load test this?
- [ ] Write up Ref Guide Docs
- [ ] Can we add it to
techproductsas an example? - [x] Add UBI to Admin UI as flag
- [ ] Add UBI to SolrJ basic client
- [ ] Add UBI to SolrJ JSON Query client
Tests
Bats test to demonstrate the end to end use of UBI.
Checklist
Please review the following and check all that apply:
- [ ] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
- [ ] I have created a Jira issue and added the issue ID to my pull request title.
- [ ] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
- [ ] I have developed this patch against the
mainbranch. - [ ] I have run
./gradlew check. - [ ] I have added tests for my changes.
- [ ] I have added documentation for the Reference Guide
I'd love to review here, but I think I need some more starting information either in a ref guide page or a JIRA, I'm kind of lost right now...
I'd love to review here, but I think I need some more starting information either in a ref guide page or a JIRA, I'm kind of lost right now...
Yeah... I'll go ahead and write up some ref guide docs! And finish the demo .bats script ;-)
Usually, features like these are discussed in the dev@ list, or in JIRA or a SIP. Most important question I have in mind is whether this needs to be in the core search engine? If not, can this not be a plugin/package, shipped outside of solr-core?
This is definitely draft mode code... I opened it as a PR just to be able to track the work, and once it gets a bit furthur, I plan on opening a proper discussion about it. Module? Solr Sandbox? A Component? A full blown package? So many fun options...
I figured out how to parse and run a streaming expression that is used to write the query analytics data to, well, anywhere we want ;-). The next area is to look at is actually integrating the streaming expression INTO the component as more than just a one off.. I gotta figure out how to take the data and pass it into the streaming expression... input() maybe??? Also think about how to not rebuild/destroy/rebuild the streaming expression for every query.
Then more Ref Guide docs, a BATS integration test maybe, and then a discussion about who wants to use it first! Plus of course the ever critical, "where does the code live" conversation.
Oh, and of course, we now have a machine readable schema via Json Schema available here https://github.com/o19s/ubi
I am having some second thoughts about the idea of logging ubi queries to disk... Why? Any real use case you want them to go somewhere. Plus log4j is a pain to touch... So... May just rip that part out. You want to log to disk? just write a streaming expression ;-)
Making progress.... I've added some tasks to do, and then I think I'm going to flip it from Draft to Ready for Review and email the community. I'd like to demo it at a upcoming community meeting.
BTW, UBI now has an actual website! https://www.ubisearch.dev/
Something like this would be great to discuss at a Solr meetup/conf/whatever. Feel free to take half of a community meetup to show & tell when you're ready.
Hey, we now have a website! http://ubisearch.dev for more info!
A question for the smarter folks that me. Should the classes UBIQuery and UBIQueryStream be added to the UBIComponent.java? UBIQuery is just a pojo... And UBIQueryStream wires the use of the component up to a streaming expression. I don't see either ever being used elsewhere....
Just stubbed my toe on the "Distributed processing is harder than single core processing"! With a two node set up, I discovered that I am logging to a SINGLE userfiles/ubi_queries.jsonl file, and I log once for each shard.. instead of just logging on the collector step..
{"query_id":"c4e40af6-67b7-4824-8b63-5aae70a485f6","timestamp":"2024-11-27T13:42:19.121Z"}
{"query_id":"5dfedf02-fd89-4e40-b3aa-7700c162800b","timestamp":"2024-11-27T13:42:19.121Z"}
Sigh.
Argh, a bit stuck. I can't figure out how to have the UBIComponent during a distributed query, look up the final doc id's and record them before sending them back to the user. With a single node single shard, it works great, but not in a distributed fashion.
I keep getting:
2024-11-28 12:36:31.368 ERROR (qtp428039780-40-localhost-11) [c:twoshard s:shard1 r:core_node4 x:twoshard_shard1_replica_n2 t:localhost-11] o.a.s.s.HttpSolrCall 500 Exception => java.lang.NullPointerException: Cannot read field "docList" because the return value of "org.apache.solr.handler.component.ResponseBuilder.getResults()" is null
at org.apache.solr.handler.component.UBIComponent.doStuff(UBIComponent.java:315)
java.lang.NullPointerException: Cannot read field "docList" because the return value of "org.apache.solr.handler.component.ResponseBuilder.getResults()" is null
at org.apache.solr.handler.component.UBIComponent.doStuff(UBIComponent.java:315) ~[?:?]
at org.apache.solr.handler.component.UBIComponent.distributedProcess(UBIComponent.java:252) ~[?:?]
at org.apache.solr.handler.component.SearchHandler.processComponents(SearchHandler.java:552) ~[?:?]
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:429) ~[?:?]
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:238) ~[?:?]
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2875) ~[?:?]
I think I'm going to flip it from Draft to Ready for Review and email the community. I'd like to demo it at a upcoming community meeting.
@epugh did you ever demo this? I don't remember it, and think it'd be an awesome thing to highlight (and a great way to garner interest and get feedback)