sequenceserver icon indicating copy to clipboard operation
sequenceserver copied to clipboard

Rescan databases dir

Open Bjoernsen opened this issue 6 years ago • 7 comments

I have a central resource managing e.g. fasta files and building the corresponding blast+ indices automatically.

I have a fully automated deployment pipeline of blast+ indices to the sequenceserver database directory that runs often.

Everything works perfect. Except that I must restart the docker container(s) for the sequenceserver manually (without killing running jobs).

I know that you are not a fan of Automatically detect new databases #325

What do you think about a configuration added to sequenceserver.conf to enable the detection with false as default?

I already played around a bit and could offer a pull request.

Bjoernsen avatar Apr 02 '19 14:04 Bjoernsen

I am not up to date on docker. In the old scheme of things, Apache+Passenger should be able to handle restarts gracefully out of the box. That is, when you indicate to Passenger that the app needs to be restarted, it will start a new process and start directing all subsequent requests to the new process and terminate the old process once all previously queued requests have been handled. Alternatively, Puma and a couple other (unicorn?) Ruby application servers can handle restarts gracefully. I will be surprised if you can't do something similar automatically using docker compose or so.

yeban avatar Apr 03 '19 14:04 yeban

How the sequenceserver is hosted is not the issue. My problem is that I have a complete automated system/pipeline. A user uploads a fasta file and the system validates, creates the blast+ index and does many other jobs. One job is the 'deployment' to the sequenceserver data folder using symlinks.

Unfortunately, the sequenceserver does not 'recognize' the new blast db until it has been (manually) restarted.

Of course I could use e.g. passenger to host the application and write a restart.txt or similar to force a restart. But I thought it would be easier to enable the rescan via parameter on the sequenceserver side.

Bjoernsen avatar Apr 03 '19 14:04 Bjoernsen

I like the approach, that you can configure if a database rescan happens at every request, or just at serverstart.

magelan avatar Apr 04 '19 14:04 magelan

I am happy to have an option in SequenceServer to rescan databases, and it can be in the direction of #391 (which is a very clever hack, imo) provided my concern about race condition is addressed.

yeban avatar Apr 16 '19 10:04 yeban

How can I modify my MR #391 to get merged?

Bjoernsen avatar May 13 '19 10:05 Bjoernsen

I think the first step would be to have a separate class for the collection of databases, i.e. Database and Databases. The second step would be to change scan_database_dir to always return a new instance of Databases class (instead of adding Database objects to a static variable that is shared by all objects). The return value of scan_database_dir can be made accessible to relevant code in a manner similar to logger, config, and sys methods of SequenceServer module. Let's say this function would be SequenceServer.databases. This function could take a boolean value based on which it returns a cached object (default) or a new object from running scan_databases_dir. Finally, the rescan should probably be done in routes.rb in searchdata.json route.

yeban avatar May 15 '19 13:05 yeban

I started the suggested refactoring of Databases class after my last comment almost a month ago - https://github.com/yeban/sequenceserver/tree/refactor_database_class. The idea was to provide a base for your PR. But quickly realized that it was a bit of work so had to drop it for the time being, but I hope it gives you the idea.

yeban avatar May 15 '19 13:05 yeban