SOLR-14410: switch from SysV to systemd service
Description
- Update the installation script to configure a systemd service instead of a SysV service.
- Update the documentation with the replacement systemd (systemctl) commands, and notes.
This is a port of a previous PR https://github.com/apache/lucene-solr/pull/1435, with the addition of a bug fix.
Tests
I manually tested the changes in a debian/buster + openjdk-11-jre VM, with the solr-8.11.0.tgz archive.
I manually repackaged the archive to include the solr/bin/systemd/solr.service file, and then invoked the modified installer script and passed along the updated archive. After fixing the substitution issue I got a running setup.
id: ‘solr’: no such user
Creating new user: solr
Adding system user `solr' (UID 108) ...
Adding new group `solr' (GID 114) ...
Adding new user solr' (UID 108) with group solr' ...
Creating home directory `/var/solr' ...
Extracting solr-8.11.0.tgz to /opt
tar: solr-8.11.0/bin/install_solr_service.sh: time stamp 2021-11-21 01:29:47 is 421.338911707 s in the future
tar: solr-8.11.0/bin/systemd/solr.service: time stamp 2021-11-21 01:30:12 is 446.3380425 s in the future
Installing symlink /opt/solr -> /opt/solr-8.11.0 ...
Installing /etc/systemd/system/solr.service ...
Installing /etc/default/solr.in.sh ...
Created symlink /etc/systemd/system/multi-user.target.wants/solr.service → /etc/systemd/system/solr.service.
Service solr installed.
Customize Solr startup configuration in /etc/default/solr.in.sh
Checklist
Please review the following and check all that apply:
- [x] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
- [x] I have created a Jira issue and added the issue ID to my pull request title.
- [x] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
- [x] I have developed this patch against the
mainbranch. - [ ] I have run
./gradlew check. - [ ] I have added tests for my changes.
- [x] I have added documentation for the Reference Guide
@janhoy I know in the previous PR your pointed out the possibility of handling the upgrade path SysV to systemd https://github.com/apache/lucene-solr/pull/1435#discussion_r723118782 .
Not sure if you'd be ok with merging this PR without that change. I might be able to add in those changes, at some point, but exactly when, I'm not sure.
I see that I won't have bandwidth to test the service on multiple OSes now that I'm managing the 9.0 release. I thought of doing it with docker (hackish but I've done it before), but it could also be done with Vagrant or similar. Will you manage to execute tests, including an upgrade from 8.11 test (ideally it should uninstall the init.d service or exit with a message to remove it manually)? If so, we can perhaps have this committed for 9.0 at the end of the month.
I brought this PR up to date with main, so it can be easily tested. You can check out the PR branch locally, build and then try installing, or you can download a prebuilt tar from http://cominvent.com/pub/solr-10.0.0-SNAPSHOT.tgz
Once you have tested on a certain OS, please report back the results here, so we can check more test check-boxes.
I rebased on latest main branch and re-built the convenience distribution tar, available at http://www.cominvent.com/pub/solr-10.0.0-SNAPSHOT-slim.tgz (small download).
had some boxes to setup and I patched this onto branch_9_2 to give it a whirl... First problem I had was that though the error message in the setup script says to set JAVA_HOME it doesn't actually use JAVA_HOME when testing for java so it fails endlessly. Worked around that by setting path and then pushing that past sudo.
(following ref guide here: https://solr.apache.org/guide/solr/latest/deployment-guide/taking-solr-to-production.html)
Got the following:
[ec2-user@ip-172-31-14-227 ~]$ sudo env "PATH=$PATH" ./install_solr_service.sh solr-9.2.2-SNAPSHOT.tgz
id: ‘solr’: no such user
Creating new user: solr
Extracting solr-9.2.2-SNAPSHOT.tgz to /opt
Installing symlink /opt/solr -> /opt/solr-9.2.2-SNAPSHOT ...
Installing /etc/systemd/system/solr.service ...
Input 'solr' is not an absolute file system path, escaping is likely not going to be reversible.
Installing /etc/default/solr.in.sh ...
Created symlink /etc/systemd/system/multi-user.target.wants/solr.service → /etc/systemd/system/solr.service.
Service solr installed.
Customize Solr startup configuration in /etc/default/solr.in.sh
Job for solr.service failed because the control process exited with error code.
See "systemctl status solr.service" and "journalctl -xeu solr.service" for details.
[ec2-user@ip-172-31-14-227 ~]$ sudo systemctl status solr.service
× solr.service - Apache Solr
Loaded: loaded (/etc/systemd/system/solr.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Fri 2023-06-09 20:53:06 UTC; 7min ago
Process: 4111 ExecStart=/opt/solr/bin/solr start (code=exited, status=1/FAILURE)
CPU: 11ms
Jun 09 20:53:06 ip-172-31-14-227.us-east-2.compute.internal systemd[1]: solr.service: Scheduled restart job, restart counter is at 5.
Jun 09 20:53:06 ip-172-31-14-227.us-east-2.compute.internal systemd[1]: Stopped solr.service - Apache Solr.
Jun 09 20:53:06 ip-172-31-14-227.us-east-2.compute.internal systemd[1]: solr.service: Start request repeated too quickly.
Jun 09 20:53:06 ip-172-31-14-227.us-east-2.compute.internal systemd[1]: solr.service: Failed with result 'exit-code'.
Jun 09 20:53:06 ip-172-31-14-227.us-east-2.compute.internal systemd[1]: Failed to start solr.service - Apache Solr.
still looking int why it might be failing but /var/solr/logs is empty...
Ok need to give it JAVA_HOME:
Environment=JAVA_HOME=/opt/java/zulu11.64.19-ca-jdk11.0.19-linux_x64/
So the install script in general seems to need to be more respectful of JAVA_HOME... (and add it to the service file if it's set)
Next thing I notice is that this is setting up a legacy/user-managed service. We should probably be defaulting to a cloud service? (certanly zookeeper existing beforehand is a pre-requisite, but the user should pass in the zk string, and if the zkroot doesn't exist it should make it. If it does exist the new node should join that cluster?
Another thought that's of interest is that we should probably be explicit about environment set in the .service file taking precedence.
It would also be nice if we set a system property or other reliable signal visible in the ui that identifies the service file that started solr for when the consultant or new hire has to figure out how someone's istall is(n't?) working.
@elyograg and @gus-asf Please add your concrete code reviews to this PR so we can aim for completion.
The important part is to test on major Linux flavors and spot bugs. In the PR description I have some checkboxes that you can check once you have validated a distro.
@gus-asf: Your other comments about cloud vs self-managed and UI changes are interesting, but belongs in new JIRA issues, so if you want to pursue those, please open new JIRAs to unblock this PR.
Ping. Any new energy on this? Would be nice to at least cut it over in main branch for 10.0
This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution!
Got a reminder from stale-bot on this.
Preparing this for merge to main branch:
- Added a CHANGES entry
- Added refguide upgrade notes
- Added warning printout if system.d script found on system
- Fixed a chmod line in build script
- Brought up to date with main branch
@gus-asf Given this runs well on two linux distros, I feel we can put this into main and get some early usage of it. Should there still be bugs, we can tacke that in 10.0.1 etc. Wdyt?
@elyograg you comment in the JIRA that "The current installer detected when CentOS 7 did not have a new enough Java and refused to even install. This installer seems to omit that check." I cannot see that the java checks are changed at all in this PR.
I did one more change - use {{FOO}} as template string for replacement in the unit file. More readable, less confusing.
Also, I build a distro, copied it to an empty Ubuntu system and did an install with customized -s, -u and -p args, and it all worked nicely.
So I think this is ready for merge to main. Holding a few days for more feedback.