opengrok icon indicating copy to clipboard operation
opengrok copied to clipboard

IgnoredNames does not work properly with project centric processing workflow

Open tulinkry opened this issue 6 years ago • 14 comments

I reported this in slack channel, in 1.1-rc41 (haven't tested others) the .git files were hidden in xref browsing. In 1.1 they're not. I don't know if this is a problem in my setup or if any logic in indexer is broken. Can someone else verify?

1.1-rc41: screenshot 2018-12-28 at 09 00 43 1.1: screenshot 2018-12-28 at 09 00 48

tulinkry avatar Dec 28 '18 10:12 tulinkry

That looks as if the ignored list for Git no longer works as a whole - next to .git there is also the .gitignore and Gitrepository.java has this in its constructor:

98          ignoredDirs.add(".git");
99          ignoredFiles.add(".gitignore");

Does this happen for other SCMs ?

vladak avatar Dec 29 '18 13:12 vladak

Will track this as a bug for now.

vladak avatar Dec 29 '18 13:12 vladak

fwiw, this bug is not happening to me using 1.1

Shooter3k avatar Jan 04 '19 13:01 Shooter3k

So it is a bad setup I think. Let me experiment.

tulinkry avatar Jan 05 '19 10:01 tulinkry

In my case it disapears when I try to upload groups with following command. Before the configuration has the "ignoreNames" section and it disappears when this is uploaded.

#
# The custom settings are now generated in the READ_ONLY_XML - let us apply it
#
opengrok-projadm \
        --base /var/opengrok \
        --jar ${JAR} \
        --roconfig ${READ_ONLY_XML} \
        --configmerge `which opengrok-config-merge` \
        --uri http://localhost:8080 \
        --refresh \
        --upload

Especially this command does not add the IgnoredNames property to the final configuration:

2019-01-06 13:32:41,240    DEBUG opengrok_tools | Command ['/Users/ktulinger/OpenGrok/opengrok-tools/env/bin/opengrok-config-merge', '-l', '10', '-a', 'distribution/target/opengrok/lib/opengrok.jar', '/var/opengrok/etc/groups.xml', '/var/folders/w4/fr4pd7zn0x1f9lwc8hsmhbwh0000gp/T/tmpocneq8p0'] took 1 seconds

This is the webapp configuration:

<?xml version="1.0" encoding="UTF-8"?>
<java version="1.8.0_144" class="java.beans.XMLDecoder">
 <object class="org.opengrok.indexer.configuration.Configuration" id="Configuration0">
  <void property="cmds">
   <object class="java.util.Collections" method="unmodifiableMap">
    <object class="java.util.HashMap">
     <void method="put">
      <string>org.opengrok.indexer.history.SubversionRepository</string>
      <string>/usr/bin/svn</string>
     </void>
     <void method="put">
      <string>org.opengrok.indexer.history.GitRepository</string>
      <string>/usr/bin/git</string>
     </void>
    </object>
   </object>
  </void>
  <void property="ctags">
   <string>/usr/local/bin/ctags</string>
  </void>
  <void property="dataRoot">
   <string>/private/var/opengrok/data</string>
  </void>
  <void id="IgnoredNames0" property="ignoredNames">
   <void id="IgnoredDirs0" property="ignoredDirs">
    <void property="items">
     <void method="add">
      <string>.bk</string>
     </void>
     <void method="add">
      <string>.hg</string>
     </void>
     <void method="add">
      <string>.bzr</string>
     </void>
     <void method="add">
      <string>.git</string>
     </void>
     <void method="add">
      <string>.svn</string>
     </void>
     <void method="add">
      <string>SCCS</string>
     </void>
     <void method="add">
      <string>.razor</string>
     </void>
     <void method="add">
      <string>RCS</string>
     </void>
     <void method="add">
      <string>CVS</string>
     </void>
     <void method="add">
      <string>CVSROOT</string>
     </void>
     <void method="add">
      <string>.repo</string>
     </void>
    </void>
   </void>
   <void id="IgnoredFiles0" property="ignoredFiles">
    <void property="items">
     <void method="add">
      <string>.hgtags</string>
     </void>
     <void method="add">
      <string>.hgignore</string>
     </void>
     <void method="add">
      <string>.gitignore</string>
     </void>
     <void method="add">
      <string>.p4config</string>
     </void>
     <void method="add">
      <string>.cvsignore</string>
     </void>
    </void>
   </void>
  </void>
  <void property="projectsEnabled">
   <boolean>true</boolean>
  </void>
  <void property="sourceRoot">
   <string>/private/var/opengrok/src</string>
  </void>
 </object>
</java>

Groups configuration contains just the single group as you would guess from the next snippet.

Result:

<?xml version="1.0" encoding="UTF-8"?>
<java version="9" class="java.beans.XMLDecoder">
 <object class="org.opengrok.indexer.configuration.Configuration" id="Configuration0">
  <void property="cmds">
   <object class="java.util.Collections" method="unmodifiableMap">
    <object class="java.util.HashMap">
     <void method="put">
      <string>org.opengrok.indexer.history.SubversionRepository</string>
      <string>/usr/bin/svn</string>
     </void>
     <void method="put">
      <string>org.opengrok.indexer.history.GitRepository</string>
      <string>/usr/bin/git</string>
     </void>
    </object>
   </object>
  </void>
  <void property="ctags">
   <string>/usr/local/bin/ctags</string>
  </void>
  <void property="dataRoot">
   <string>/private/var/opengrok/data</string>
  </void>
  <void property="groups">
   <void method="add">
    <object class="org.opengrok.indexer.configuration.Group">
     <void property="name">
      <string>group-1</string>
     </void>
     <void property="pattern">
      <string>group-1.*</string>
     </void>
    </object>
   </void>
  </void>
  <void property="projectsEnabled">
   <boolean>true</boolean>
  </void>
  <void property="sourceRoot">
   <string>/private/var/opengrok/src</string>
  </void>
 </object>
</java>

tulinkry avatar Jan 06 '19 12:01 tulinkry

Isolated a test case:

    @Test
    public void test() throws Exception {
        Configuration cfgBase = new Configuration();
        cfgBase.addGroup(new Group("group-1", "group-1-*"));

        Configuration cfgNew = new Configuration();
        final RuntimeEnvironment env = RuntimeEnvironment.getInstance();
        env.setConfiguration(cfgNew);
        RepositoryFactory.initializeIgnoredNames(env);

        System.out.println(cfgBase.getXMLRepresentationAsString());
        System.out.println(cfgNew.getXMLRepresentationAsString());

        merge(cfgBase, cfgNew);

        System.out.println(cfgNew.getXMLRepresentationAsString());
        Assert.assertTrue("Should contain .git ignored dir", cfgNew.getIgnoredNames().getIgnoredDirs().getItems().contains(".git"));
    }

Looks like it is skipped because the groups.xml contains default ignored names.

tulinkry avatar Jan 06 '19 12:01 tulinkry

I think this is a problem of the merge itself, perhaps the same as to what is described in #2147.

vladak avatar Jan 07 '19 10:01 vladak

Workaround is change the flow:

#
# Download the current webapp configuration to BASE_XML
#
opengrok-projadm \
	--base /var/opengrok \
	--java ${JAVA_HOME}/bin/java \
	--jar ./lib/opengrok.jar \
	--uri http://localhost:8080/source \
	--refresh

#
# The custom settings are now generated in the READ_ONLY_XML - let us apply it
#
TEMPFILE=`mktemp`
echo "Merging the configuration with read-only configuration"
run_configmerge ${BASE_XML} ${READ_ONLY_XML} > ${TEMPFILE}
mv -f ${TEMPFILE} ${BASE_XML}
echo "Applying the changes to webapp"
curl -X PUT --header "Content-Type: application/xml" --data "@${BASE_XML}" http://localhost:8080/source/api/v1/configuration

tulinkry avatar Jan 07 '19 13:01 tulinkry

I'm not sure if this is related or not but I've had tons of head scratching issues with the configuration file and finally landed on calling the indexer with -W and -R parameters so that it reads in the old parameters options and then writes the new one when it's done. This has resolved all of my issues with the configuration file.

Here is my full index command for which I use on 200+ GB of files/code.

Note: this is a WIP version as we're working on installing things in a more appropriate manor

/opt/rh/rh-python35/root/usr/bin/opengrok-indexer -C
-J=-Djava.util.logging.config.file=/network/drive/opengrok/grok/repo1/unixlogging.properties
-a /network/drive/opengrok/opengrok-1.1/lib/opengrok.jar --
-s /network/drive/opengrok/grok/repo1/source/
-d /network/drive/opengrok/grok/repo1/data
-P
-p /default1
-p /default2
-c /network/drive/opengrok/ctags/ctags
-H
-S
-G
--leadingWildCards on
-W /network/drive/opengrok/grok/repo1/etc/configuration_unix.xml
-R /network/drive/opengrok/grok/repo1/etc/configuration_unix.xml
-U http://server/source

Shooter3k avatar Jan 07 '19 16:01 Shooter3k

Thank you. I can confirm when using full indexer for all projects (like your example), the problem disappears.

However, I wanted to set up a per project indexing (using opengrok-reindex-project python script, eventually running indexer only per specified directory) and that's what led to these issues (because the configuration is never written to a file at the end of indexing in this case).

tulinkry avatar Jan 07 '19 17:01 tulinkry

In this exact case the problem is that IgnoredNames does not override equals nor hashCode.

tulinkry avatar Jan 09 '19 09:01 tulinkry

Which bubbles down that when using merge, all properties of configuration should implement equals or otherwise from our perspective the results can be indeed surprising.

tulinkry avatar Jan 09 '19 09:01 tulinkry

Is the issue still open? In my case, I removed a couple of directories from IgnoredDirs and ran a reindex with opengrok-resync. But the directories I removed from ignore list are still missing from the index. Should I use another method of indexing? Volume of code quite big, so it's long process...

shusterboris avatar Mar 14 '23 13:03 shusterboris

Is the issue still open? In my case, I removed a couple of directories from IgnoredDirs and ran a reindex with opengrok-resync. But the directories I removed from ignore list are still missing from the index. Should I use another method of indexing? Volume of code quite big, so it's long process...

Unless these directories are changed, they will not appear in already existing index. The reindex process is incremental.

vladak avatar Mar 16 '23 12:03 vladak