git sparse repository checkout leads to JGitInternalException exception when creating history cache
Update Below issue seems to be happening only with the projects with sparse checkout of git repo. For other projects, it seems to work fine.
I am running a opengrok docker instance with many projects... however after recent opengrok version upgrade and few unrelated changes (OS upgrade where code reside), the indexer command started failing. I see error only in history command, but it doesnt create either data/xref/
I see below error in log:
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.history.HistoryGuru lambda$createHistoryCacheReal$36
WARNING: failed to create history cache for {dir='/opengrok/src/project_name.github.devops',type=git,history=on,historyCache=on,merge=true,annotationCache=off,tagsEnabled=off}
org.eclipse.jgit.api.errors.JGitInternalException: Error while parsing attributes
at org.eclipse.jgit.treewalk.TreeWalk.getAttributes(TreeWalk.java:635)
at org.eclipse.jgit.treewalk.TreeWalk.getAttributes(TreeWalk.java:589)
at org.eclipse.jgit.diff.DiffEntry.scan(DiffEntry.java:169)
at org.eclipse.jgit.diff.DiffEntry.scan(DiffEntry.java:110)
at org.eclipse.jgit.diff.DiffEntry.scan(DiffEntry.java:87)
at org.eclipse.jgit.diff.DiffFormatter.scan(DiffFormatter.java:533)
at org.opengrok.indexer.history.GitRepository.getFilesBetweenCommits(GitRepository.java:649)
To Reproduce Place the folder in /opengrok/src/<project_name> and start the indexer command :
Example command that I run manually (it runs automatically bu container):
opengrok-reindex-project -J=-XX:-UseGCOverheadLimit -J=-Xmx36g -J=-server --printoutput --api_timeout 300 --jar /opengrok/lib/opengrok.jar -t /opengrok/etc/logging.properties.template -d /opengrok/log/project_name.github.devops -U http://localhost:8080/ -P project_name.github.devops -- --connectTimeout 300 -r dirbased -G -m 4096 --leadingWildCards on -c /usr/local/bin/ctags -o /opengrok/etc/ctags.config -U http://localhost:8080/ -H project_name.github.devops
Expected behavior Indexing should work normally and it should create "/opengrok/data/xref/project_name.github.devops, /opengrok/data/index/project_name.github.devops etc.
Full log
Sep 20, 2025 12:49:40 PM org.opengrok.indexer.configuration.Configuration read
INFO: Reading configuration from '/tmp/tmpb7qpb63v'
Sep 20, 2025 12:49:41 PM org.opengrok.indexer.index.Indexer parseOptions
INFO: Indexer options: [-R, /tmp/tmpb7qpb63v, --connectTimeout, 300, -r, dirbased, -G, -m, 4096, --leadingWildCards, on, -c, /usr/local/bin/ctags, -o, /opengrok/etc/ctags.config, -U, http://localhost:8080/, -H, project_name.github.devops]
INFO: file with extra options for ctags: /opengrok/etc/ctags.config
SLF4J(W): No SLF4J providers were found.
SLF4J(W): Defaulting to no-operation (NOP) logger implementation
SLF4J(W): See https://www.slf4j.org/codes.html#noProviders for further details.
Sep 20, 2025 12:49:42 PM org.opengrok.indexer.util.Statistics logIt
INFO: Done invalidating repositories (1 valid, 1 working) (took 529 ms)
Sep 20, 2025 12:49:42 PM org.opengrok.indexer.index.Indexer runMain
INFO: Indexer version 1.14.2 (4c5dc2465cb0729dfe0fc765d8bd7cf99ee82e91) running on Java version: 21.0.8+9-LTS, name: OpenJDK 64-Bit Server VM, vendor: Eclipse Adoptium, arch: amd64 with properties: ncpu: 14, maxMemory: 36.0 GiB
Sep 20, 2025 12:49:42 PM org.opengrok.indexer.configuration.RuntimeEnvironment validateUniversalCtags
INFO: Using ctags: Universal Ctags 6.2.0(df6a390df), Copyright (C) 2015-2025 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
Compiled: Aug 29 2025, 08:33:35
URL: https://ctags.io/
Output version: 1.1
Optional compiled features: +wildcards, +regex, +iconv, +option-directory, +xpath, +yaml, +packcc, +optscript
Sep 20, 2025 12:49:42 PM org.opengrok.indexer.index.Indexer prepareIndexer
INFO: Generating history cache for repositories: /project_name.github.devops
Sep 20, 2025 12:49:42 PM org.opengrok.indexer.history.HistoryGuru createHistoryCacheReal
INFO: Creating history cache for 1 repositories
Sep 20, 2025 12:49:42 PM org.opengrok.indexer.history.HistoryGuru createHistoryCache
INFO: Creating history cache for {dir='/opengrok/src/project_name.github.devops',type=git,history=on,historyCache=on,merge=true,annotationCache=off,tagsEnabled=off}
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.history.HistoryGuru lambda$createHistoryCacheReal$36
WARNING: failed to create history cache for {dir='/opengrok/src/project_name.github.devops',type=git,history=on,historyCache=on,merge=true,annotationCache=off,tagsEnabled=off}
org.eclipse.jgit.api.errors.JGitInternalException: Error while parsing attributes
at org.eclipse.jgit.treewalk.TreeWalk.getAttributes(TreeWalk.java:635)
at org.eclipse.jgit.treewalk.TreeWalk.getAttributes(TreeWalk.java:589)
at org.eclipse.jgit.diff.DiffEntry.scan(DiffEntry.java:169)
at org.eclipse.jgit.diff.DiffEntry.scan(DiffEntry.java:110)
at org.eclipse.jgit.diff.DiffEntry.scan(DiffEntry.java:87)
at org.eclipse.jgit.diff.DiffFormatter.scan(DiffFormatter.java:533)
at org.opengrok.indexer.history.GitRepository.getFilesBetweenCommits(GitRepository.java:649)
at org.opengrok.indexer.history.GitRepository.getFilesForCommit(GitRepository.java:615)
at org.opengrok.indexer.history.GitRepository.traverseHistory(GitRepository.java:544)
at org.opengrok.indexer.history.RepositoryWithHistoryTraversal.doCreateCache(RepositoryWithHistoryTraversal.java:194)
at org.opengrok.indexer.history.Repository.createCache(Repository.java:403)
at org.opengrok.indexer.history.HistoryGuru.createHistoryCache(HistoryGuru.java:1078)
at org.opengrok.indexer.history.HistoryGuru.lambda$createHistoryCacheReal$36(HistoryGuru.java:1122)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: org.eclipse.jgit.errors.MissingObjectException: Missing blob 6bd7389c8ac39c614e8ffdfd075f2ca8bbb83e6d
at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:138)
at org.eclipse.jgit.treewalk.CanonicalTreeParser.loadAttributes(CanonicalTreeParser.java:393)
at org.eclipse.jgit.treewalk.CanonicalTreeParser.findAttributes(CanonicalTreeParser.java:385)
at org.eclipse.jgit.treewalk.CanonicalTreeParser.getEntryAttributesNode(CanonicalTreeParser.java:375)
at org.eclipse.jgit.attributes.AttributesHandler.attributesNode(AttributesHandler.java:402)
at org.eclipse.jgit.attributes.AttributesHandler.mergePerDirectoryEntryAttributes(AttributesHandler.java:232)
at org.eclipse.jgit.attributes.AttributesHandler.getAttributes(AttributesHandler.java:144)
at org.eclipse.jgit.treewalk.TreeWalk.getAttributes(TreeWalk.java:631)
... 16 more
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.util.Statistics logIt
INFO: Done history cache for all repositories (took 29.252 seconds)
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.index.Indexer prepareIndexer
INFO: Done generating history cache
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.util.Statistics logIt
INFO: Done invalidating repositories (1 valid, 1 working) (took 79 ms)
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.index.Indexer doIndexerExecution
INFO: Starting indexing
Sep 20, 2025 12:50:11 PM org.apache.lucene.store.MemorySegmentIndexInputProvider <init>
INFO: Using MemorySegmentIndexInput with Java 21; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.index.IndexDatabase addIndexDatabaseForProject
SEVERE: Failed to create history cache for some repositories of project project_name.github.devops:indexed=false,history=true: {{dir='/opengrok/src/p2v.tera.vcf.main.github-...
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.index.Indexer doIndexerExecution
INFO: Waiting for the executors to finish
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.util.Statistics logIt
INFO: Done indexing data of all repositories (took 37 ms)
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.util.Statistics logIt
INFO: Indexer finished (took 31.30 seconds)
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.index.Indexer runMain
INFO: Indexer finished with success
Additional context Opengrok image: 1.14.1 Indexer running on centos(rocky9)
How do you perform the Git sparse checkout exactly ?
There is no mention of sparse checkout in JGit source or Github issues.
What I think needs to happen, at least in a first stage of handling this problem, is that the indexer will check the core.sparseCheckout property of the Git repository and disable history for the associated project if set to true. Also, history based reindex might not work.
How do you perform the Git sparse checkout exactly ?
We perform sparse checkout 2 ways. Because the cone mode doesnt support negations (!)
Pattern mode (wildcards/negations)
git clone --filter=blob:none --no-checkout <repo_url> <project_name>
cd <project_root>
git sparse-checkout init --no-cone
(my code writes .git/info/sparse-checkout with regex rules for file to include/exclude)
git checkout
Cone mode (directories only; no “!” negations)
git clone --filter=blob:none --no-checkout <repo_url> <project_name>
cd <project_root>
git sparse-checkout init --cone
git sparse-checkout set
Please note, project sync operations are happening manually (through a cron job on indexing server which is mounted on container).
Thanks for the commands. My hunch is that the Missing blob in the exception message and the --filter=blob:none are probably related.
Please note, project sync operations are happening manually (through a cron job on indexing server which is mounted on container).
This should not matter for this case as long as the sync does not collide with indexing.