solr
solr copied to clipboard
SOLR-16328 : intern() strings in DocCollection to reduce memory footp…
A lot of strings are repeated in state.json . interning those strings can dramatically reduce the heap used
The heap used by the DocCollection Object for varying shard sizes is as follows
shards | w/o optimization(MB) | w/ optimization
512 1.36 1.06
1024 2.63 2.029
2048 5.21 3.98
4096 10.34 7.89
Which version of Java were you testing this with?
I'm likely -1 on this patch based solely on the venerable Aleksey Shipilev post https://shipilev.net/jvm/anatomy-quarks/10-string-intern/
I suspect that we want to de-duplicate the keys but not the values? Do the values repeat across shards?
Maybe we can identify a known set of duplicated strings and load them into a set.
Was this measured using PRS or "traditional" state?
Which version of Java were you testing this with?
java 11
I suspect that we want to de-duplicate the keys but not the values? Do the values repeat across shards?
state.json has most Strings repeated, even values. Take a look at the sample below. Other than range almost all values can be safely interned
{"prs1":{
"pullReplicas":"0",
"replicationFactor":"1",
"router":{"name":"compositeId"},
"maxShardsPerNode":"2048",
"autoAddReplicas":"false",
"nrtReplicas":"1",
"tlogReplicas":"0",
"shards":{
"shard1":{
"range":"80000000-80ffffff",
"state":"active",
"replicas":{"core_node3":{
"core":"prs1_",
"node_name":"10.0.0.49:8986_solr",
"base_url":"http://10.0.0.49:8986/solr",
"state":"active",
"type":"NRT",
"force_set_state":"false",
"leader":"true"}}}}
Was this measured using PRS or "traditional" state?
I tested with both and they're not very different
In case of PRS we get more benefits because state.json is loaded once and not reloaded for a very long time.
We have very large collections and very large state.json files (~10K replicas ). The state.json runs into multiple MBs and the in memory footprint is 10-20MB per collection and we have several 100 collections in each VM. a 30% savings can mean multiple GBs shaved off the heap
You are correct, a lot of the values are repeated too, whether they are booleans or enums, a lot of them are basically part of a closed set. Still, I don't think JVM interning is the way to go, and I would rather see us implement a HashMap or CHM interner as described in the linked blog post.
Can you try running with the JVM option -XX:+UseStringDeduplication - that might handle this issue for you without the need for code change.
@madrob I'm testing with a modified code that uses a weakmap interner and the results are better
@madrob any idea why the gradle precommit is failing ? it doesn't fail for me locally.
Precommit passes for me locally.
ishan@x1extreme ~/code/fullstorydev-solr/solr (jira/SOLR-16328) $ ../gradlew check -x test
Downloading https://services.gradle.org/distributions/gradle-7.2-all.zip
..............10%...............20%...............30%...............40%...............50%...............60%...............70%...............80%...............90%...............100%
To honour the JVM settings for this build a single-use Daemon process will be forked. See https://docs.gradle.org/7.2/userguide/gradle_daemon.html#sec:disabling_the_daemon.
Daemon will be stopped at the end of the build
> Task :localSettings
IMPORTANT. This is the first time you ran the build. I wrote some sane defaults (for this machine) to 'gradle.properties', they will be picked up on consecutive gradle invocations (not this one).
Run gradlew :helpLocalSettings for more information.
> Task :errorProneSkipped
WARNING: errorprone disabled (skipped on builds not running inside CI environments, pass -Pvalidation.errorprone=true to enable)
> Task :solr:solrj:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:solrj-zookeeper:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:core:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:test-framework:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:analysis-extras:compileJava
Note: /home/ishan/code/fullstorydev-solr/solr/modules/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:sql:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:core:analyzeClassesDependencies
Dependency analysis found issues.
unusedDeclaredArtifacts
- org.apache.lucene:lucene-backward-codecs:9.3.0@jar
> Task :solr:core:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:prometheus-exporter:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:prometheus-exporter:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:prometheus-exporter:analyzeTestClassesDependencies
Dependency analysis found issues.
unusedDeclaredArtifacts
- org.apache.lucene:lucene-test-framework:9.3.0@jar
> Task :solr:solr-ref-guide:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:solr-ref-guide:downloadAntoraCli
added 60 packages, changed 1 package, and audited 62 packages in 17s
7 packages are looking for funding
run `npm fund` for details
found 0 vulnerabilities
npm notice
npm notice New minor version of npm available! 8.11.0 -> 8.19.1
npm notice Changelog: <https://github.com/npm/cli/releases/tag/v8.19.1>
npm notice Run `npm install -g [email protected]` to update!
npm notice
> Task :solr:solr-ref-guide:downloadAntoraLunrExtension
added 18 packages, changed 1 package, and audited 81 packages in 7s
18 packages are looking for funding
run `npm fund` for details
found 0 vulnerabilities
> Task :solr:solr-ref-guide:downloadAntoraSiteGenerator
added 153 packages, changed 1 package, and audited 235 packages in 23s
31 packages are looking for funding
run `npm fund` for details
9 high severity vulnerabilities
To address issues that do not require attention, run:
npm audit fix
To address all issues (including breaking changes), run:
npm audit fix --force
Run `npm audit` for details.
> Task :solr:solr-ref-guide:downloadAsciidoctorMathjaxExtension
added 8 packages, changed 1 package, and audited 244 packages in 8s
31 packages are looking for funding
run `npm fund` for details
9 high severity vulnerabilities
To address issues that do not require attention, run:
npm audit fix
To address all issues (including breaking changes), run:
npm audit fix --force
Run `npm audit` for details.
> Task :solr:solr-ref-guide:buildLocalSite
The generated local ref-guide can be found at:
/home/ishan/code/fullstorydev-solr/solr/solr-ref-guide/build/site/index.html
> Task :solr:solr-ref-guide:downloadLinkValidator
npm WARN deprecated [email protected]: Please upgrade to latest, formidable@v2 or formidable@v3! Check these notes: https://bit.ly/2ZEqIau
npm WARN deprecated [email protected]: Please upgrade to v7.0.2+ of superagent. We have fixed numerous issues with streams, form-data, attach(), filesystem errors not bubbling up (ENOENT on attach()), and all tests are now passing. See the releases tab for more information at <https://github.com/visionmedia/superagent/releases>.
added 49 packages, changed 1 package, and audited 294 packages in 5s
37 packages are looking for funding
run `npm fund` for details
9 high severity vulnerabilities
To address issues that do not require attention, run:
npm audit fix
To address all issues (including breaking changes), run:
npm audit fix --force
Run `npm audit` for details.
> Task :solr:solr-ref-guide:checkSiteLinks
{
"stats": {
"errors": [],
"warnings": [],
"parsedFiles": 814,
"localLinks": 231,
"localAnchorLinks": 3053,
"parentLinks": 195,
"parentAnchorLinks": 1,
"remoteLinks": 0,
"remoteAnchorLinks": 0
}
}
> Task :solr:solrj:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:test-framework:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:analysis-extras:compileTestJava
Note: /home/ishan/code/fullstorydev-solr/solr/modules/analysis-extras/src/test/org/apache/solr/schema/TestICUCollationFieldUDVAS.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:analytics:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:analytics:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:clustering:analyzeTestClassesDependencies
Dependency analysis found issues.
unusedDeclaredArtifacts
- org.apache.lucene:lucene-test-framework:9.3.0@jar
> Task :solr:modules:extraction:compileJava
Note: /home/ishan/code/fullstorydev-solr/solr/modules/extraction/src/java/org/apache/solr/handler/extraction/ExtractingDocumentLoader.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:extraction:compileTestJava
Note: /home/ishan/code/fullstorydev-solr/solr/modules/extraction/src/test/org/apache/solr/handler/extraction/ExtractingRequestHandlerTest.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:gcs-repository:compileTestJava
Note: /home/ishan/code/fullstorydev-solr/solr/modules/gcs-repository/src/test/org/apache/solr/gcs/ConcurrentDelegatingStorage.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:hadoop-auth:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:hadoop-auth:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:hadoop-auth:analyzeTestClassesDependencies
Dependency analysis found issues.
unusedDeclaredArtifacts
- com.github.spotbugs:spotbugs-annotations:4.5.3@jar
> Task :solr:modules:hdfs:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:hdfs:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:jwt-auth:compileTestJava
Note: /home/ishan/code/fullstorydev-solr/solr/modules/jwt-auth/src/test/org/apache/solr/security/jwt/JWTAuthPluginIntegrationTest.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:jwt-auth:analyzeTestClassesDependencies
Dependency analysis found issues.
unusedDeclaredArtifacts
- com.fasterxml.jackson.core:jackson-databind:2.13.3@jar
> Task :solr:modules:langid:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:ltr:compileTestJava
Note: /home/ishan/code/fullstorydev-solr/solr/modules/ltr/src/test/org/apache/solr/ltr/model/TestMultipleAdditiveTreesModel.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :solr:modules:scripting:compileTestJava
Note: /home/ishan/code/fullstorydev-solr/solr/modules/scripting/src/test/org/apache/solr/scripting/update/TestBadScriptingUpdateProcessorConfig.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.
You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.
See https://docs.gradle.org/7.2/userguide/command_line_interface.html#sec:command_line_warnings
BUILD SUCCESSFUL in 13m 33s
463 actionable tasks: 463 executed
ishan@x1extreme ~/code/fullstorydev-solr/solr (jira/SOLR-16328) $ java -version
openjdk version "11.0.15" 2022-04-19
OpenJDK Runtime Environment 18.9 (build 11.0.15+10)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.15+10, mixed mode, sharing)
ishan@x1extreme ~/code/fullstorydev-solr/solr (jira/SOLR-16328) $
The failure is
/home/runner/work/solr/solr/solr/solrj/src/java/org/apache/solr/common/cloud/CompositeIdRouter.java:238: warning: [NarrowCalculation] This product of integers could overflow before being implicitly cast to a long.
> Task :solr:solrj:compileJava
boolean round = rangeStep >= (1 << bits) * 16;
^
(see https://errorprone.info/bugpattern/NarrowCalculation)
Did you mean 'boolean round = rangeStep >= (1 << bits) * 16L;'?
Error-prone static analysis is disabled locally by default and only runs on CI because some people thought it was too slow.
It looks like this issue comes up because you are upgrading error-prone from 2.11 to 2.14
unfortunately we don't have a choice. We have to upgrade error-prone as well
> Could not resolve com.google.errorprone:error_prone_annotations:2.11.0.
Required by:
project :solr:modules:hdfs > project :solr:core
project :solr:modules:hdfs > project :solr:solrj
project :solr:modules:hdfs > project :solr:core > project :solr:solrj-zookeeper
> Cannot find a version of 'com.google.errorprone:error_prone_annotations' that satisfies the version constraints:
Dependency path 'org.apache.solr:hdfs:10.0.0-SNAPSHOT' --> 'com.github.ben-manes.caffeine:caffeine:3.1.1' (apiElements) --> 'com.google.errorprone:error_prone_annotations:2.14.0'
Constraint path 'org.apache.solr:hdfs:10.0.0-SNAPSHOT' --> 'org.apache:solr-root:10.0.0-SNAPSHOT' (gcvLocks) --> 'com.google.errorprone:error_prone_annotations:{strictly 2.11.0}' because of the following reason: Locked by versions.lock
Constraint path 'org.apache.solr:hdfs:10.0.0-SNAPSHOT' --> 'org.apache.solr:solrj:10.0.0-SNAPSHOT' (apiElements) --> 'com.google.errorprone:error_prone_annotations:2.11.0' because of the following reason: Computed