opengrok
opengrok copied to clipboard
IgnoredNames does not work properly with project centric processing workflow
I reported this in slack channel, in 1.1-rc41 (haven't tested others) the .git
files were hidden in xref browsing. In 1.1 they're not. I don't know if this is a problem in my setup or if any logic in indexer is broken. Can someone else verify?
1.1-rc41:
1.1:
That looks as if the ignored list for Git no longer works as a whole - next to .git
there is also the .gitignore
and Gitrepository.java
has this in its constructor:
98 ignoredDirs.add(".git");
99 ignoredFiles.add(".gitignore");
Does this happen for other SCMs ?
Will track this as a bug for now.
fwiw, this bug is not happening to me using 1.1
So it is a bad setup I think. Let me experiment.
In my case it disapears when I try to upload groups with following command. Before the configuration has the "ignoreNames" section and it disappears when this is uploaded.
#
# The custom settings are now generated in the READ_ONLY_XML - let us apply it
#
opengrok-projadm \
--base /var/opengrok \
--jar ${JAR} \
--roconfig ${READ_ONLY_XML} \
--configmerge `which opengrok-config-merge` \
--uri http://localhost:8080 \
--refresh \
--upload
Especially this command does not add the IgnoredNames property to the final configuration:
2019-01-06 13:32:41,240 DEBUG opengrok_tools | Command ['/Users/ktulinger/OpenGrok/opengrok-tools/env/bin/opengrok-config-merge', '-l', '10', '-a', 'distribution/target/opengrok/lib/opengrok.jar', '/var/opengrok/etc/groups.xml', '/var/folders/w4/fr4pd7zn0x1f9lwc8hsmhbwh0000gp/T/tmpocneq8p0'] took 1 seconds
This is the webapp configuration:
<?xml version="1.0" encoding="UTF-8"?>
<java version="1.8.0_144" class="java.beans.XMLDecoder">
<object class="org.opengrok.indexer.configuration.Configuration" id="Configuration0">
<void property="cmds">
<object class="java.util.Collections" method="unmodifiableMap">
<object class="java.util.HashMap">
<void method="put">
<string>org.opengrok.indexer.history.SubversionRepository</string>
<string>/usr/bin/svn</string>
</void>
<void method="put">
<string>org.opengrok.indexer.history.GitRepository</string>
<string>/usr/bin/git</string>
</void>
</object>
</object>
</void>
<void property="ctags">
<string>/usr/local/bin/ctags</string>
</void>
<void property="dataRoot">
<string>/private/var/opengrok/data</string>
</void>
<void id="IgnoredNames0" property="ignoredNames">
<void id="IgnoredDirs0" property="ignoredDirs">
<void property="items">
<void method="add">
<string>.bk</string>
</void>
<void method="add">
<string>.hg</string>
</void>
<void method="add">
<string>.bzr</string>
</void>
<void method="add">
<string>.git</string>
</void>
<void method="add">
<string>.svn</string>
</void>
<void method="add">
<string>SCCS</string>
</void>
<void method="add">
<string>.razor</string>
</void>
<void method="add">
<string>RCS</string>
</void>
<void method="add">
<string>CVS</string>
</void>
<void method="add">
<string>CVSROOT</string>
</void>
<void method="add">
<string>.repo</string>
</void>
</void>
</void>
<void id="IgnoredFiles0" property="ignoredFiles">
<void property="items">
<void method="add">
<string>.hgtags</string>
</void>
<void method="add">
<string>.hgignore</string>
</void>
<void method="add">
<string>.gitignore</string>
</void>
<void method="add">
<string>.p4config</string>
</void>
<void method="add">
<string>.cvsignore</string>
</void>
</void>
</void>
</void>
<void property="projectsEnabled">
<boolean>true</boolean>
</void>
<void property="sourceRoot">
<string>/private/var/opengrok/src</string>
</void>
</object>
</java>
Groups configuration contains just the single group as you would guess from the next snippet.
Result:
<?xml version="1.0" encoding="UTF-8"?>
<java version="9" class="java.beans.XMLDecoder">
<object class="org.opengrok.indexer.configuration.Configuration" id="Configuration0">
<void property="cmds">
<object class="java.util.Collections" method="unmodifiableMap">
<object class="java.util.HashMap">
<void method="put">
<string>org.opengrok.indexer.history.SubversionRepository</string>
<string>/usr/bin/svn</string>
</void>
<void method="put">
<string>org.opengrok.indexer.history.GitRepository</string>
<string>/usr/bin/git</string>
</void>
</object>
</object>
</void>
<void property="ctags">
<string>/usr/local/bin/ctags</string>
</void>
<void property="dataRoot">
<string>/private/var/opengrok/data</string>
</void>
<void property="groups">
<void method="add">
<object class="org.opengrok.indexer.configuration.Group">
<void property="name">
<string>group-1</string>
</void>
<void property="pattern">
<string>group-1.*</string>
</void>
</object>
</void>
</void>
<void property="projectsEnabled">
<boolean>true</boolean>
</void>
<void property="sourceRoot">
<string>/private/var/opengrok/src</string>
</void>
</object>
</java>
Isolated a test case:
@Test
public void test() throws Exception {
Configuration cfgBase = new Configuration();
cfgBase.addGroup(new Group("group-1", "group-1-*"));
Configuration cfgNew = new Configuration();
final RuntimeEnvironment env = RuntimeEnvironment.getInstance();
env.setConfiguration(cfgNew);
RepositoryFactory.initializeIgnoredNames(env);
System.out.println(cfgBase.getXMLRepresentationAsString());
System.out.println(cfgNew.getXMLRepresentationAsString());
merge(cfgBase, cfgNew);
System.out.println(cfgNew.getXMLRepresentationAsString());
Assert.assertTrue("Should contain .git ignored dir", cfgNew.getIgnoredNames().getIgnoredDirs().getItems().contains(".git"));
}
Looks like it is skipped because the groups.xml contains default ignored names.
I think this is a problem of the merge itself, perhaps the same as to what is described in #2147.
Workaround is change the flow:
#
# Download the current webapp configuration to BASE_XML
#
opengrok-projadm \
--base /var/opengrok \
--java ${JAVA_HOME}/bin/java \
--jar ./lib/opengrok.jar \
--uri http://localhost:8080/source \
--refresh
#
# The custom settings are now generated in the READ_ONLY_XML - let us apply it
#
TEMPFILE=`mktemp`
echo "Merging the configuration with read-only configuration"
run_configmerge ${BASE_XML} ${READ_ONLY_XML} > ${TEMPFILE}
mv -f ${TEMPFILE} ${BASE_XML}
echo "Applying the changes to webapp"
curl -X PUT --header "Content-Type: application/xml" --data "@${BASE_XML}" http://localhost:8080/source/api/v1/configuration
I'm not sure if this is related or not but I've had tons of head scratching issues with the configuration file and finally landed on calling the indexer with -W and -R parameters so that it reads in the old parameters options and then writes the new one when it's done. This has resolved all of my issues with the configuration file.
Here is my full index command for which I use on 200+ GB of files/code.
Note: this is a WIP version as we're working on installing things in a more appropriate manor
/opt/rh/rh-python35/root/usr/bin/opengrok-indexer -C
-J=-Djava.util.logging.config.file=/network/drive/opengrok/grok/repo1/unixlogging.properties
-a /network/drive/opengrok/opengrok-1.1/lib/opengrok.jar --
-s /network/drive/opengrok/grok/repo1/source/
-d /network/drive/opengrok/grok/repo1/data
-P
-p /default1
-p /default2
-c /network/drive/opengrok/ctags/ctags
-H
-S
-G
--leadingWildCards on
-W /network/drive/opengrok/grok/repo1/etc/configuration_unix.xml
-R /network/drive/opengrok/grok/repo1/etc/configuration_unix.xml
-U http://server/source
Thank you. I can confirm when using full indexer for all projects (like your example), the problem disappears.
However, I wanted to set up a per project indexing (using opengrok-reindex-project
python script, eventually running indexer only per specified directory) and that's what led to these issues (because the configuration is never written to a file at the end of indexing in this case).
In this exact case the problem is that IgnoredNames does not override equals
nor hashCode
.
Which bubbles down that when using merge, all properties of configuration should implement equals or otherwise from our perspective the results can be indeed surprising.
Is the issue still open? In my case, I removed a couple of directories from IgnoredDirs and ran a reindex with opengrok-resync. But the directories I removed from ignore list are still missing from the index. Should I use another method of indexing? Volume of code quite big, so it's long process...
Is the issue still open? In my case, I removed a couple of directories from IgnoredDirs and ran a reindex with opengrok-resync. But the directories I removed from ignore list are still missing from the index. Should I use another method of indexing? Volume of code quite big, so it's long process...
Unless these directories are changed, they will not appear in already existing index. The reindex process is incremental.