docker-images icon indicating copy to clipboard operation
docker-images copied to clipboard

Improving the startup time and performance of the Database image

Open Djelibeybi opened this issue 3 years ago • 11 comments

MODERATOR NOTE: Originally posted by @lukaseder in https://github.com/oracle/docker-images/issues/1678#issuecomment-779940945 which is closed. I'm opening this new issue to continue that discussion.

I'm coming here from a Twitter discussion: https://twitter.com/gmarziou/status/1361700331483234307

I understand that these things are difficult to move forward at Oracle. I've been in touch with numerous people at Oracle about this topic, including e.g. @gvenzl. I just want to stress that pretty much all of Oracle's competition, including IBM with both Db2 and the arcane Informix, or Microsoft with SQL Server (they even had to adapt everything to work on Linux, first!!), and obviously PostgreSQL, just to name a few (a few more: CockroachDB, Exasol, Firebird, HANA, Ignite, MariaDB, MemSQL (now SingleStore), MySQL) make working with Docker:

  • A lot easier
  • A lot faster
  • A lot less lawyer-y

To be fair, an example that is harder to work with: Sybase ASE. Yes, given Larry's only ever tweet, Oracle is still doing better than SAP. Oh wait. No. HANA is also by SAP. 😁

@gvenzl recently asked me to join a product meeting at Oracle where Oracle would reach out to the community to learn what Oracle can improve with respect to developer friendliness, especially in a context of losing to the competition.

This is one of the top issues that comes to my mind from a developer perspective. Yes, things were a lot worse in the past in pre-docker days, and we're also discussing this publicly on github instead of via a closed MOS SR. But things could be much better by now with regards to this docker integration. This issue here could be one of Oracle's top dev friendliness priorities. Alas, it was auto-closed twice.

Look at how I install SQL Server 2019:

docker pull mcr.microsoft.com/mssql/server:2019-latest
docker run -e ACCEPT_EULA=Y -e SA_PASSWORD=Test1234 -e MSSQL_PID=Express -p 1433:1433 --name SQLSERVER2019 -d mcr.microsoft.com/mssql/server:2019-latest

And it runs fast. Isn't that also what everyone working at Oracle internally on various products, like e.g. SQL Developer, APEX, SQL*Plus, Enterprise Manager, ojdbc, etc etc. would want for themselves as well?

Originally posted by @lukaseder in https://github.com/oracle/docker-images/issues/1678#issuecomment-779940945

Djelibeybi avatar Feb 16 '21 19:02 Djelibeybi

My own experience.

My pre-built (empty) Oracle 19 Standard image is 8 GB large, and 13 GB large once schema is instantiated.

To share it with my team, I needed to use an internal private docker registry, after several failed attempts to upload it by docker push over a bad network because of forced remote working in COVID-19 times, I ended up splitting it up in smaller parts using tar, uploading them using sftp and then pushing it locally. Fortunately, pulling worked better. Although, I had to request increasing disk space on our registry to support multiple versions. I can't imagine fully automating this process in Jenkins.

Startup time is 40 seconds while Postgresql starts up in few seconds. This impacts the way we write unit tests.

We use docker only for development and testing. If project would have been a green field one, this bad developer experience would have lead us to choose Postgresql but this is a 10 years-old project so....

gmarziou avatar Feb 17 '21 09:02 gmarziou

Honestly, the fact that we have to build the images ourselves is a huge set back over the competition. Any other database it's just docker pull and run. For Oracle we have to have some infrastructure to handle the build. Then if you have more than one developer using it, you'll also need some infrastructure in place to distribute the image.

Now you have all this infrastructure in place you want to automate everything.., but you have to deal with the installer files. You either need to fork/hack these provided docker files to grab that from an internal location, or you have to wrap the whole thing in a parent process that moves the installers around for you.

We run a patched version of Oracle, and need/want automation to run against the same version. Since we decided to copy these files (can't fork to an enterprise github), we just hacked the whole patching process into the dockerfile. (The whole opatch thing is outside the realm of discussion here, but why did my patch not include the version of opatch required to apply itself?) Now I have 3 separate installer objects that need to be available to the docker build process that aren't available until some sort of human intervention happens.

We take the database container one step further through automation though:

  • run db migration scripts (liquibase) into a small seed database instance each git push
  • take an rman backup of the seed database
  • create a new container image by applying the rman backup to the database image. This lets downstream testing and develop processes to pick up a database container matching our current code at any point (minus the time it takes to push 15G containers around).

I suspect that most of the startup time is being robbed during runOracle.sh and relinkOracleBinary.sh scripts which run at startup. When I go into my running instance and shutdown immediate followed by startup, it's only 6 or 7 seconds. There is a lot of stuff going on in runOracle.sh that has to happen when the container starts. Whenever I see a bunch of file system operations happening at startup, my general feeling is that these can/should be moved into the build. I don't know why we should be relinking binaries every single time the DB comes up.

ghaase avatar Feb 17 '21 17:02 ghaase

We have similar issues at Netflix. We're using testcontainers for validating changes to our views and procedures. When we upgraded from 11 XE to 18 XE our image build times went up to ~90 minutes and test startup time doubled (~12s to ~24s - this adds up when you have to spin up a fresh container for each test). Image size is about 2G which I believe is the driving factor.

Our container is the stock Oracle instance with our schema loaded into it (no data - that's done by each test individually).

In addition to the performance issues we encountered some other challenges:

  • It was a bit of a pain to switch to CDB/PDB setup imposed by 18 XE. In particular testcontainers didn't work nicely with this and we had to hack in an override for the connection string that apparently isn't compatible with the latest version of testcontainers. So we're pegged to an older version.
  • The lack of a 19 XE version drove our decision to stick with 18 (both for our AWS-provided SE license and our XE testcontainer setup). Would be nice to have a 19 XE version since 18 is being EOLed in the next year.
  • There were some recommendations on how to pare down the image size but they differed by version and seemed to produce mixed (sometimes counterproductive) results. Given the time spent addressing other issues and the long build cycle time we called it quits once we got the performance down to the 2x mark.

paul-k-young avatar Feb 17 '21 19:02 paul-k-young

We are currently using Oracle 19.8.0.0 as a main database at a clients' project.

For local development, the image needs to be built by every one of the 25 developers with scripts from this repository from scratch (there is no internal Docker registry at the client). Else there would be no way to run databases on the local development machines. This is part of the onboarding process, and takes about 45 minutes. If the DB version is updated (which happens twice a year), everything needs to be rebuilt by everybody. It also prevents on running integration tests with Testcontainers, prevents us from running integration tests in parallel.

The fact, that the latest Oracle database version in this repository is 19.3.0.0 is also not ideal, as local development and staging are currently two different versions (19.3 vs 19.8).

The loss in developer productivity, long runtimes on build pipelines and time it takes to maintain the scripts and environments grew to a significant amount of money, which is not going unnoticed by the stakeholders.

javahippie avatar Feb 18 '21 13:02 javahippie

The loss in developer productivity, long runtimes on build pipelines and time it takes to maintain the scripts and environments grew to a significant amount of money, which is not going unnoticed by the stakeholders.

I think this is one of the key factors here. Even if folks are quite pro Oracle in terms of what the RDBMS offers (and I count myself among them), this factor cannot be ignored. The alternatives make developers much more productive, and if a manager is already thinking about finally migrating off Oracle for whatever reason, then that might just be the last straw.

lukaseder avatar Feb 18 '21 13:02 lukaseder

Another aspect to consider is the legal side, or really the understandability of the legal side.

I'd like to see something that makes it really obvious that it is a) legal and free of charge to run on developer machines and CI-servers. b) easy and obviously legal to setup in the source code of tests which might get distributed under typical open source licenses.

I'm aware that you never know once you start talking to lawyers, but I think there is value in presenting what is allowed and what is not allowed in a way that is not intimidating to developers.

schauder avatar Feb 18 '21 16:02 schauder

I know about a project from ~10 years ago where a really smart DBA managed to strip down a Oracle DB so that it would fit on a 1GB RAM disc by removing various packages related to special features that weren't needed by that project and probably aren't used by 95% of all projects using Oracle.

Obviously that was really hacky and probably in violation of some license agreement, but it worked really well. Having something like this available in the form of docker images would be really nice and a good way to reduce the footprint of these images.

schauder avatar Feb 18 '21 16:02 schauder

by removing various packages related to special features that weren't needed by that project

Sounds like what the utplsql folks are doing as well: https://twitter.com/Pazus/status/1362030711637766145

lukaseder avatar Feb 18 '21 16:02 lukaseder

I think what has been done on utPLSQL has largely been included here now with the slim build.

Overall I think Oracle will slowly shift to what we are looking for I believe most want and agree that things could be improved and be more developer-friendly in this DevOps age but perhaps some of the higher-up's and lawyers haven't really gotten on board yet. Perhaps this will be the downfall of Oracle ( I don't really believe so), only time will tell.

Regardless while there can be improvements Oracle now offering Docker Images and Vagrant images is a big step forward from where they were a few years ago.

pazyp avatar Mar 03 '21 09:03 pazyp

toooooooo slow. when xe version docker build only using one core. It's unbelievable in modern development. if you set limit one core when i using it, I can understand. but not in build stage. but build is too slow. in my mini server(poor single-core performance). 10 percent built in 100 minutes.

paomian avatar Feb 20 '23 08:02 paomian

This impacts the way we write unit tests.

This impacts on software quality and costs. Due to the mentioned issues my personal approach (confirmed by experience various times):

if (isOracle) {
   estimation *= 2.5;
}

marcwittke avatar Oct 12 '23 11:10 marcwittke