Performance slows after batch deleting imagery during inference
Output from my machine just now when batch deleting imagery during inference
23.02 tiles/s | 18.29 avg tiles/s
18.17 tiles/s | 18.29 avg tiles/s
Starting extraneous imagery cleanup/deletion
Calculation for expanded positive coords for Cambridge, Massachusetts completed
Deleted 0 non-solar panel containing imagery tiles for Cambridge, Massachusetts
Calculation for expanded positive coords for San Antonio, Texas completed
Deleted 75783 non-solar panel containing imagery tiles for San Antonio, Texas
Deleted 0 non-solar panel containing imagery tiles for San Antonio, Texas
Deletion finished
1.08 tiles/s | 18.20 avg tiles/s
8.43 tiles/s | 18.15 avg tiles/s
11.74 tiles/s | 18.12 avg tiles/s
12.93 tiles/s | 18.10 avg tiles/s
23.13 tiles/s | 18.12 avg tiles/s
A couple guesses of why this is:
- maybe changing a bunch of data in sqlite means that indexes have to be rebuilt for certain queries
- maybe stopping the API calls to Mapbox for a period of time breaks some sort of ongoing connection to their servers (even though they're separate requests)
- maybe I'm deleting too much imagery before it is done being used, and it has to query a bunch right off the bat (logging statistics about how many API queries are done in a batch might be useful for seeing if this is the case)
Also seems to happen right on startup too:
(venv) tyler@tyler-MS-7821:~/PycharmProjects/SolarPanelDataWrangler$ python run_entire_process.py --city "San Antonio" --state "Texas"
Searching OSM for a polygon for: San Antonio, Texas
Checking if this search polygon is already tracked in the database.
Calculating the distance to the search polygon's centroid from each point if it hasn't been done before.
Running classification on every tile in the search polygon that hasn't had inference ran yet.
2019-04-10 12:16:20.082014: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-04-10 12:16:20.153237: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-10 12:16:20.153599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.8475
pciBusID: 0000:01:00.0
totalMemory: 7.93GiB freeMemory: 6.97GiB
2019-04-10 12:16:20.153613: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-04-10 12:16:20.347262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-10 12:16:20.347291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-04-10 12:16:20.347297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-04-10 12:16:20.347460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6719 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
Starting extraneous imagery cleanup/deletion
Calculation for expanded positive coords for Cambridge, Massachusetts completed
Deleted 0 non-solar panel containing imagery tiles for Cambridge, Massachusetts
Calculation for expanded positive coords for San Antonio, Texas completed
Deleted 76865 non-solar panel containing imagery tiles for San Antonio, Texas
Deleted 0 non-solar panel containing imagery tiles for San Antonio, Texas
Deletion finished
/home/tyler/PycharmProjects/SolarPanelDataWrangler/venv/lib/python3.5/site-packages/skimage/transform/_warps.py:105: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
/home/tyler/PycharmProjects/SolarPanelDataWrangler/venv/lib/python3.5/site-packages/skimage/transform/_warps.py:110: UserWarning: Anti-aliasing will be enabled by default in skimage 0.15 to avoid aliasing artifacts when down-sampling images.
warn("Anti-aliasing will be enabled by default in skimage 0.15 to "
1.03 tiles/s | 1.03 avg tiles/s
7.42 tiles/s | 4.22 avg tiles/s
13.24 tiles/s | 7.23 avg tiles/s
17.09 tiles/s | 9.69 avg tiles/s
Which leads me to believe it's more related to 1 or 3, since this didn't seem to be happening on startup before
Getting even slower as more cities are in the db. I wonder if there's a way to get a log of longest queries from sqlite. Just to see where adding an index would help.
I think this is probably to be expected since sqlite is an in-memory DB, right? Maybe this suggests that we should move to something like Postgres sooner rather than later.
In terms of quick wins, I think maybe some of querying code could be adjusted to pull only what's needed? I just took a quick look at the solardb file, and for example I think this query could be re-worked such that only the name field is queried for. I think (although I'm not 100% sure) that right now it's querying for everything in the SearchPolygon table, when really you just want the name. I don't know how much that would speed it up (if at all, really), but it seems plausible that slimming down the queries might help.
Some slowdown is expected in sqlite due to increased numbers of rows and whatnot, but the slowdown I'm experiencing here is like right after a bunch of rows have had values changed, so it seems like maybe it's invalidating some sort of cache, and then performance picks back up after a couple of batches. I think we might be able to add an index or something, depending on what query is really slow.
I don't think the query you point out is the culprit because that table is currently only 3-4 rows long for my instance. It basically just holds some metadata for each separate search area. The big table is slippy_tiles
My hunch is that it's this query. I tried adding an index for the order rows, but it might be necessary to also index by the filtered rows as well. Not 100% sure atm. Will have to look a little more in depth when I can.
But the more I think about it, you're right, we probably need to switch to postgres sooner rather than later. I think maybe after we wrap up contributor friendliness stuff and I get to a stopping point on the alternate imagery source task, we should give that a look. There's probably lots of postGIS features we could be taking advantage of to make everything easier.
Some slowdown is expected in sqlite due to increased numbers of rows and whatnot, but the slowdown I'm experiencing here is like right after a bunch of rows have had values changed, so it seems like maybe it's invalidating some sort of cache, and then performance picks back up after a couple of batches. I think we might be able to add an index or something, depending on what query is really slow.
Sorry, yeah to clarify the only piece of the slowdown I was suggesting was due to the use of sqlite was the part where it was "getting even slower as more cities are in the db." I agree that the other slowdowns look to be due to the invalidation of some sort of cache, especially given what you've posted here.
I don't think the query you point out is the culprit because that table is currently only 3-4 rows long for my instance. It basically just holds some metadata for each separate search area. The big table is slippy_tiles
My hunch is that it's this query. I tried adding an index for the order rows, but it might be necessary to also index by the filtered rows as well. Not 100% sure atm. Will have to look a little more in depth when I can.
Agreed! I just picked one of the first queries I found. The query you pointed out looks to be a bit heavy, especially if it's from the biggest table. It doesn't look like there is a clear way to slim it down, so I think your suggestion of making sure we use the right indices is probably the way to go.
But the more I think about it, you're right, we probably need to switch to postgres sooner rather than later. I think maybe after we wrap up contributor friendliness stuff and I get to a stopping point on the alternate imagery source task, we should give that a look. There's probably lots of postGIS features we could be taking advantage of to make everything easier.
Yeah, agreed. I should have loads more time opening up in about two weeks, too, so should be able to contribute much more heavily!
Nice! Sounds good :)