opengrok icon indicating copy to clipboard operation
opengrok copied to clipboard

Opengrok for very large project Google Repo based

Open shilpakangya opened this issue 4 years ago • 12 comments

Hi, I am facing problem to create the Opengrok workspace from a very large project with multiple sub-repositories, which uses google repo tool. For this, I am unable to generate the mirror.yml tool using the opengrok-mirror tool since I am not aware of -H HEADER option, so I have manually created the mirror.yml. Even I have created readonly configuration for customize config. Can someone suggest to the example mirror.yml and readonly.xml file(as input config for indexer)? I am attaching these file for review. I am unable to generate the index even for the first time and I dont see the dropdown project on webapp. Can you pease helpUsing these file along with this command

"docker run -d \
    --name opengrok \
    -p 80:8080/tcp \
    -e SYNC_PERIOD_MINUTES="10" \
	-e NOMIRROR="1" \
	-e CHECK_INDEX="1" \
	-e INDEXER_OPT="-W /opengrok/etc/configuration.xml -m 256 -H -P -S -G --progress -v -O on -T 3 --depth 9 --assignTags --renamedHistory on -R /opengrok/etc/readonly.xml" \
    -v /home/uxxxxx/opengrok_basic/src/:/opengrok/src/ \
    -v /home/uxxxxx/opengrok_basic/etc/:/opengrok/etc/ \
    -v /home/uxxxxx/opengrok_basic/data/:/opengrok/data/ \
    ogk_basic

I have attached the logs and the configuration used. open cfg.zip

shilpakangya avatar Jul 15 '21 14:07 shilpakangya

Can I request you to help review my configuration files? Request your helps and inputs on this. Thanks

shilpakangya avatar Jul 15 '21 14:07 shilpakangya

The per project settings in the read-only configuration look sane however in order to use with the official Docker image it is necessary to use the READONLY_CONFIG_FILE environment variable (see https://github.com/oracle/opengrok/tree/master/docker#environment-variables). Using the -R directly will thwart the functionality of the main Docker script.

vladak avatar Jul 20 '21 17:07 vladak

As for using opengrok-mirror to sync Android repository tree, I'd like to try it one day, esp. for #3622.

vladak avatar Jul 20 '21 17:07 vladak

The mirror_v2.yml file has some weird artifact on the last line that causes YAML parsing to fail. Once removed, it loads fine, however still contains some weird indentation which I am not sure will be processed correctly.

The initial section:

commands:                                                                       
  repo:                                                                         
    # override repository command                                               
    command: /usr/local/bin/repo                                                
    sync: ['repo', 'sync','-cf']                                                
    # override incoming check with custom command (/my/custom/git is not called for incoming check)
    # incoming: ['/bin/echo', 'Syncing the Repo!']                              
    incoming: ['repo', 'sync', '-n']                                            
    incoming_check: true

will not work as intended - the commands section is meant to merely replace paths to commands executed. If you want to specify path to the repo command, it should look like this:

commands:
  repo: /usr/local/bin/repo

vladak avatar Jul 20 '21 17:07 vladak

The mirror_v2.yml file has some weird artifact on the last line that causes YAML parsing to fail. Once removed, it loads fine, however still contains some weird indentation which I am not sure will be processed correctly.

The initial section:

commands:                                                                       
  repo:                                                                         
    # override repository command                                               
    command: /usr/local/bin/repo                                                
    sync: ['repo', 'sync','-cf']                                                
    # override incoming check with custom command (/my/custom/git is not called for incoming check)
    # incoming: ['/bin/echo', 'Syncing the Repo!']                              
    incoming: ['repo', 'sync', '-n']                                            
    incoming_check: true

will not work as intended - the commands section is meant to merely replace paths to commands executed. If you want to specify path to the repo command, it should look like this:

commands:
  repo: /usr/local/bin/repo

If I use only the repo path command, than where do we define the sync command, incoming command and incoming_check command. Do you have a sample/example yml file which covers all the commands? I am bit confused on the format of the yml file.

Is it possible to provide the command to execute the python script to generate correct yml file. I am unable to get all the arguments required to generate the mirror.yml. Can you suggest an example complete command as reference to generate mirrror.yml. also would appreciate if you can provide the example complete command to generate read only configuration file.

shilpakangya avatar Jul 21 '21 05:07 shilpakangya

The per project settings in the read-only configuration look sane however in order to use with the official Docker image it is necessary to use the READONLY_CONFIG_FILE environment variable (see https://github.com/oracle/opengrok/tree/master/docker#environment-variables). Using the -R directly will thwart the functionality of the main Docker script.

I am already using -R option under INDEXER_OPT="-R /opengrok/etc/readonly.xml"

shilpakangya avatar Jul 21 '21 05:07 shilpakangya

The mirror_v2.yml file has some weird artifact on the last line that causes YAML parsing to fail. Once removed, it loads fine, however still contains some weird indentation which I am not sure will be processed correctly.

The initial section:

commands:                                                                       
  repo:                                                                         
    # override repository command                                               
    command: /usr/local/bin/repo                                                
    sync: ['repo', 'sync','-cf']                                                
    # override incoming check with custom command (/my/custom/git is not called for incoming check)
    # incoming: ['/bin/echo', 'Syncing the Repo!']                              
    incoming: ['repo', 'sync', '-n']                                            
    incoming_check: true

will not work as intended - the commands section is meant to merely replace paths to commands executed. If you want to specify path to the repo command, it should look like this:

commands:
  repo: /usr/local/bin/repo

Also can you help suggest the mirror yml file to support the below commands for the large google repo?

repo init
-u ssh://xxxxxxx
-b xyz
-m abc.xml
-g all
--depth=1

repo sync --current-branch --quiet --force-sync

shilpakangya avatar Jul 21 '21 05:07 shilpakangya

The per project settings in the read-only configuration look sane however in order to use with the official Docker image it is necessary to use the READONLY_CONFIG_FILE environment variable (see https://github.com/oracle/opengrok/tree/master/docker#environment-variables). Using the -R directly will thwart the functionality of the main Docker script.

I am already using -R option under INDEXER_OPT="-R /opengrok/etc/readonly.xml"

That's will not work as intended. The path to the read-only configuration has to be supplied via the READONLY_CONFIG_FILE env var.

vladak avatar Jul 21 '21 08:07 vladak

Also can you help suggest the mirror yml file to support the below commands for the large google repo?

repo init -u ssh://xxxxxxx -b xyz -m abc.xml -g all --depth=1

repo sync --current-branch --quiet --force-sync

The opengrok-mirror program will only call the repo sync part of the command: https://github.com/oracle/opengrok/blob/7b938e44688e6f3fd54b3d6c0d3345297ff5f0d5/tools/src/main/python/opengrok_tools/scm/repo.py#L39-L43

The repo init has to be called outside.

vladak avatar Jul 21 '21 08:07 vladak

If I use only the repo path command, than where do we define the sync command, incoming command and incoming_check command. Do you have a sample/example yml file which covers all the commands? I am bit confused on the format of the yml file.

That's not possible currently. The mirror configuration for the SCM commands allows to specify the path to a binary only. That's useful for cases where the command is in non standard location. If you need higher level of customization then you have to perform the synchronization by other means.

Is it possible to provide the command to execute the python script to generate correct yml file. I am unable to get all the arguments required to generate the mirror.yml. Can you suggest an example complete command as reference to generate mirrror.yml. also would appreciate if you can provide the example complete command to generate read only configuration file.

Such capability does not exist and that's a good thing I'd say. The YAML syntax should be editable by hand and the opengrok-mirror tool should provide sufficient checking to see what is wrong (which it does not in this case, so I just created PR #3673). Also, the documentation should provide enough detail to produce the configuration by hand.

vladak avatar Jul 21 '21 09:07 vladak

The per project settings in the read-only configuration look sane however in order to use with the official Docker image it is necessary to use the READONLY_CONFIG_FILE environment variable (see https://github.com/oracle/opengrok/tree/master/docker#environment-variables). Using the -R directly will thwart the functionality of the main Docker script.

I am already using -R option under INDEXER_OPT="-R /opengrok/etc/readonly.xml"

That's will not work as intended. The path to the read-only configuration has to be supplied via the READONLY_CONFIG_FILE env var.

We are extending the usage of the main docker file and creating our own docker file to support extra configuration for large code base. In this case, I think we still need -R readonly.xml file under INDEX_OPT.

One observation is when I use the READONLY_CONFIG_FILE="/opengrok/etc/readonly.xml", it keeps waiting for tomcat. Following are initial logs,

synchronization period = 6 minutes Deploying web application extra indexer options: -W /opengrok/etc/configuration.xml -m 256 -H -P -S -G --progress -v -O on -T 3 --depth 9 --assignTags --renamedHistory on Checking if index matches current version Jul 21, 2021 1:54:20 PM org.opengrok.indexer.configuration.Configuration read INFO: Reading configuration from /opengrok/etc/configuration.xml Jul 21, 2021 1:54:20 PM org.opengrok.indexer.index.Indexer parseOptions INFO: Indexer options: [-R, /opengrok/etc/configuration.xml, --checkIndex] Jul 21, 2021 1:54:20 PM org.opengrok.indexer.configuration.Project getProject WARNING: Path of project xyz_project is not set SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Jul 21, 2021 1:54:20 PM org.opengrok.indexer.configuration.Project getProject WARNING: Path of project xyz_project is not set Jul 21, 2021 1:54:20 PM org.opengrok.indexer.configuration.Project getProject WARNING: Path of project xyz_project is not set Jul 21, 2021 1:54:20 PM org.opengrok.indexer.configuration.Project getProject WARNING: Path of project xyz_project is not set Jul 21, 2021 1:54:20 PM org.opengrok.indexer.configuration.Project getProject

Following are end logs, Jul 21, 2021 1:54:23 PM org.opengrok.indexer.configuration.Project getProject WARNING: Path of project xyz_project is not set Jul 21, 2021 1:54:23 PM org.opengrok.indexer.configuration.Project getProject WARNING: Path of project xyz_project is not set Jul 21, 2021 1:54:23 PM org.opengrok.indexer.configuration.Project getProject WARNING: Path of project xyz_project is not set Jul 21, 2021 1:54:23 PM org.opengrok.indexer.configuration.Project getProject WARNING: Path of project xyz_project is not set Jul 21, 2021 1:54:23 PM org.opengrok.indexer.configuration.Project getProject WARNING: Path of project xyz_project is not set Merging read-only configuration from '/opengrok/etc/readonly.xml' with current configuration in '/opengrok/etc/configuration.xml' Number of sync workers: 40 Waiting for Tomcat to start Starting REST app on port 5000 Sleeping for 360 seconds Starting Tomcat Sleeping for 360 seconds Sleeping for 360 seconds

Web page at the mapped local port does not reflect the project and its index,

image

But if I add the readconfig under INDEX_OPT, I can view the project will all index generated.

shilpakangya avatar Jul 21 '21 14:07 shilpakangya

Hi @shilpakangya , Did you have a solution for that? I'm facing a similar issue here, where the docker image starts but the opengrok-mirror cannot run with the default parameters, since it's a google repo repository.

Thanks in advance.

bruvbc avatar May 16 '22 17:05 bruvbc