SOLR-17635: unmarshalling Map to SimpleOrderedMap if key i of type St…
https://issues.apache.org/jira/browse/SOLR-17635
Description
Please provide a short description of the changes you're making with this pull request.
Solution
Please provide a short description of the approach taken to implement your solution.
Tests
Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.
Checklist
Please review the following and check all that apply:
- [x] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
- [ ] I have created a Jira issue and added the issue ID to my pull request title.
- [ ] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
- [ ] I have developed this patch against the
mainbranch. - [ ] I have run
./gradlew check. - [ ] I have added tests for my changes.
- [ ] I have added documentation for the Reference Guide
FYI I'm very eager to examine this in detail but am on vacation this week.
FYI I'm very eager to examine this in detail but am on vacation this week. No worries, enjoy your vacation, I will off next week and limited to my phone
Let's get #3214 merged first, thus narrowing this PR to just what it needs to do.
@dsmiley All tests are passing now, are we able to merge?
I have a some doubts that gnaw at me pertaining to the performance risks of O(1) maps suddenly becoming O(N). It doesn't sit well with me. You had suggested changing SimpleOrderedMap to be LinkedHashMap based. That's definitely safer, and whatever extra overhead is involved there should be fairly minor. Maybe we should pause merging this until pursuing that. Pursuing that probably means deprecating/removing indexed access to NamedList, requiring instead callers use iterators. But if SMO is backed by a LinkedHashMap, wouldn't then these changes become obsolete? What is the point of unmarshalling to SMO (backed by a LinkedHashMap) instead of a directly to a LinkedHashMap?
But if SMO is backed by a LinkedHashMap, wouldn't then these changes become obsolete? What is the point of unmarshalling to SMO (backed by a LinkedHashMap) instead of a directly to a LinkedHashMap?
Only an SMO is also a NamedList. A LinkedHashMap itself is not.
I was thinking about the need for this a bit more. The primary concern is a SolrJ 9 reading data sent from Solr 10, which will see more expanded use of SimpleOrderedMap that was previously a NamedList. Other circumstances can leave this disabled by default. In Solr 9 specifically, we could change BinaryResponseParser to detect the version of the writer in some way TBD and then act accordingly. A separate issue/PR.
Only an SMO is also a NamedList. A LinkedHashMap itself is not.
Just for my understand: What exactly would improve by unmarshalling to a SMO backed by LinkedHashMap compared to just a LinkedHashMap?
Map.get, Map.remove, Map.put -- basic operations are O(1) in a hash based Map, but not in a dense array List that SOM is today.
Map.get, Map.remove, Map.put -- basic operations are O(1) in a hash based Map, but not in a dense array List that SOM is today.
I do understand that part, but today, we are umarshalling to a LinkedHashMap, with the current changes we would unmarshall to a SMO (backed by an ArrayList), I do understand the potential performance issues of that change. But if we now changed the SMO to a LinkedHashMap, we would then unmarshall to an LinkedHashMap in a SMO, compared to unmarshall to a LinkedHashmap, as we do today. So what is the point of changing from a LinkedHashMap to LinkedHashMap in a SMO?
So what is the point of changing from a LinkedHashMap to LinkedHashMap in a SMO?
A LinkedHashMap doesn't implement NamedList. An SMO does. Thus a Solr 10 server could start passing maps and MapWriter things (e.g. SolrParams) to the response in situations that a SolrJ 9 is still anticipating reading a NamedList. With SOM; it's both things in one. The choice of LinkedHashMap vs ArrayList would be a regression for cases where the client code already expected a Map; was assuming O(1) performance of the map if lots of stuff is in it. Because NamedList/Map is nestable, at the time of unmarshalling we have no idea what the actual code looking at the object is casting it as. In a fantasy world if there was just exactly one thing to unmarshall, then there would be maybe readMap vs readNamedList and we wouldn't have concerns that this PR tries to solve.
That said, let's not actually change SMO to be based on a LinkedHashMap; at least not soon. I proposed only reading maps as SMO for a more narrow circumstance (9.x talking to 10.x) that would be somewhat temporary / limited, thus ameliorating a performance risk.
This PR has had no activity for 60 days and is now labeled as stale. Any new activity will remove the stale label. To attract more reviewers, please tag people who might be familiar with the code area and/or notify the [email protected] mailing list. To exempt this PR from being marked as stale, make it a draft PR or add the label "exempt-stale". If left unattended, this PR will be closed after another 60 days of inactivity. Thank you for your contribution!
I merged with main and tweaked the naming a little as I asked. But most notably, made the setting false by default, thus is an opt-in compatibility mechanism that can be enabled as we see a need for it. Given the performance risk, I felt this approach is safer.
I think this is ready to merge.
@dsmiley Thanks