airsonic
airsonic copied to clipboard
External (MySQL+Postgresql) database MUCH slower than hsql AND airsonic 10.4.1 slower than 10.4.
Problem description
External (MySQL+Postgresql) database much slower than hsql AND airsonic 10.4.1 slower than 10.4.
Steps to reproduce
120000 media files
airsonic Version 10.4.1 Scan with:
-
HSQL: 2 min 2019-10-14 13:11:49.449 INFO --- o.a.p.s.MediaScannerService : Starting to scan media library. 2019-10-14 13:13:44.995 INFO --- o.a.p.s.MediaScannerService : Scanned media library with 120000 entries.
-
MySQL+Postgresql: 90 min 2019-10-16 09:01:19.074 INFO --- o.a.p.s.MediaScannerService : Starting to scan media library. 2019-10-16 10:32:24.525 INFO --- o.a.p.s.MediaScannerService : Scanned media library with 120000 entries.
airsonic Version 10.4.0 Scan with:
- MySQL: 40 min
System information
- Airsonic version: 10.4.0 + 10.4.1
- Operating system: Ubuntu Server 16.04 LTS
- Java version: Java 8
Is this a scan from scratch on a new instance, or is this a rescan over an existing collection?
I've had the same experience re-scanning my library of 12,469 files (mostly FLAC, some mp3s). The daily scan ran for 96 minutes, according to the Docker logs.
I had the default HSQL database set up, then set up a fresh 10.4.1 Docker instance (via the linuxserver/mariadb :latest image) to test out MariaDB (via linuxserver/mariadb:latest image). Since an in-place database migration isn't supported, I simply re-added my library so this can be considered a fresh installation. The initial import took some time, but I don't have that log handy.
Is this a scan from scratch on a new instance, or is this a rescan over an existing collection?
There was no significant difference between a scan on a new instance or a rescan.
I am in an evaluation phase and test around, what software I want to use. (I think airsonic is great) The advantage of an external database is too big, I do not want to go back to HSQL.
Because of this thread I made the tests with PostgreSQL: https://github.com/airsonic/airsonic/issues/860 Result: No difference
I also tested with different jdbc-drivers and some jdbc-url-parameters: No difference.
It's strange: Because of support of MariaDB in the release-changes, I decided to use airsonic 10.4.1 with an external DB. I saw the time difference to HSQL and searched arround. I found this thread: https://github.com/airsonic/airsonic/issues/1299 and go back to 10.4.0 Result: Scan/rescan time in 10.4.1 is double as in 10.4.0
But the difference in scans between HSQL and MySQL/Postgres is more strange.
Not certain if the performance is threadpooling related, but we should switch out the DBCP2 thread pool in favor of Hikari which should certainly help with performance.
The PR here does that as part of the Spring Boot upgrade, but was closed: https://github.com/airsonic/airsonic/pull/1346
I was able to reproduce this to some extent on a minimal library with a fresh build of 70de4c8f25e9ce3eeaf6fb726859e737e46bd1e6 (under OpenJDK 11.0.5 64-bit).
- Create a music directory containing one album directory ("Album").
- Make some more directories:
for f in {1..2000}; do ln -v -s Album "Album $f"; done
- Start an empty Airsonic with the default HSQLDB 1.8 database and note down the scan time.
- Start an empty Airsonic with a local PostgreSQL 12.1 server (TCP) and note down the scan time.
Scan time is ~30x greater in PostgreSQL 12.1 than in HSQLDB 1.8.
I don't have a lot of ideas on what is happening however, especially since the difference between 10.4 and 10.4.1 is very small. Let me know if you guys manage to find out why this changed.
If you have the postgres slow query log enabled, could you check what particular queries are logged as being slow during the scan?
i'd like to see their query plans.
a hypothesis is the "IN" statements that are the issue, but it's completely speculative, and not validated by any data or even insight.
The difference between Hikari (the new default for Spring Boot 2) and Commons DBCP (the current one) seems minimal. ~30s for 1000 scanned items in both cases in my test, with a local PostgreSQL server in Docker.
The difference between Hikari (the new default for Spring Boot 2) and Commons DBCP (the current one) seems minimal.
I don't think connection pool choice has overtly anything to do with this issue or you would've seen slowdowns for all external DBs, not just Postgres.
Again, I suspect the IN statements.
Hikari which should certainly help with performance.
That's just one less hypothesis to think about :wink:
Could you by any chance post any slow query logs for your postgres instance?
I'd be extremely interested in the optimizing some queries
There's not a whole lot. A lot of time is spent in small queries (< 100ms), and most of the time is spent reading the response from a socket if you look at a profiler.
There's not a whole lot. A lot of time is spent in small queries (< 100ms), and most of the time is spent reading the response from a socket if you look at a profiler.
Which socket? DB -> Service? Or Service -> User? Service -> User overhead should be constant across DBs, and wouldn't explain slowdown specifically for postgres.
Which socket? DB -> Service? Or Service -> User?
The Airsonic
<-> PostgreSQL
socket (inside the JDBC driver).
If there is some socket overhead between DB->Service, especially if the queries themselves are taking no time, we should look into batching queries to cut down the overhead.
(plus caching)
This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.
I've stumbled upon this, and from my experiences with SQL, I'm guessing that the reason is that postgresql does a sync write for every commit, and airsonic is doing tons of small commits. I have gotten much better performance in scripts when a bunch of transactions are made between each commit. This is good for consistency but I think airsonic needs optimization to be fast. I'm nowhere near the available iops of my storage, but I think it's just sitting around waiting for sync writes.
Maybe if 100-1000 items were scanned between each commit, then each commit could also possibly store the scanning progress. This would mean a rollback of 100-1000 items if the server was interrupted