GarminDB icon indicating copy to clipboard operation
GarminDB copied to clipboard

"make" always *analyzes* also old data

Open giterest opened this issue 3 years ago • 12 comments

Describe the bug I initally retrieved all data from my Garmin account (make create_dbs). Now, I wanted to only download/analyze latest data via make. "Downloading" works, "Importing" works, but when it comes to "Analyzing", it will always start /w "Generating table entries for 2016" - and looping through all the years until 2021 takes a couple of hours.

To Reproduce Steps to reproduce the behavior:

  1. run "make"
  2. wait for "Analyzing Data"
  3. it will parse all data, not only recent additions

Expected behavior Only analyze recent/new data.

Logs

INFO:/usr/local/garmindb/GarminDB/analyze_garmin.py:Summary Tables Generation:
INFO:/usr/local/garmindb/GarminDB/analyze_garmin.py:Generating table entries for 2016

Please run bugreport.sh and attach bugreport.zip.

Additional context Add any other context about the problem here.

giterest avatar May 25 '21 05:05 giterest

How long has it been since you downloaded all the data? Import, imports all files with dates no more than 24 hours old.

tcgoetz avatar May 25 '21 12:05 tcgoetz

hi there, e.g. the oldest files at /HealthData/FitFiles/Monitoring/2016 have a last change timestamp of 23rd may 2021, 11:43 (CEST). the last time i ran make was today around 07:00 (CEST) - so there were ~43h in between.

from my perspective, downloading and importing works fine - but at the analyze step, it will always analyze also files from 2016, 2017, ... and so on.

thanks for coming back, much appreciated!

giterest avatar May 25 '21 13:05 giterest

hi. started another try (first one since my last post, so more than 2 days). as soon as [make] hits "Analyzing Data" it starts from the very beginning (2016...) and finishing to 2021 takes approx. 10-12h.

giterest avatar May 28 '21 05:05 giterest

I have the same issue, but it's not a problem for me, analyzing a year takes about 30 seconds not 10 hours.

make
...
___Analyzing Data___
Summary Tables Generation:
...
Generating table entries for 2019
100%|█████████████████████| 365/365 [00:24<00:00, 14.96days/s]
100%|█████████████████████| 52/52 [00:04<00:00, 10.71weeks/s]
100%|█████████████████████| 12/12 [00:02<00:00,  5.14months/s]
...

lpierron avatar May 28 '21 07:05 lpierron

weird.. i run garmindb on a debian buster VM with 4 cpus and 8gig of ram. i use that VM solely for testing, so there's nothing else running on it.

giterest avatar May 28 '21 07:05 giterest

I think it's a problem with your installation of sqlite3, could you reinstall it on your VM and make some tests.

If you don't use the DB, I think you can just download the latest datas:

garmin.py --all --download --import  --latest

lpierron avatar May 28 '21 08:05 lpierron

It seems that the call to Analyze.summary() defined in analyze_garmin.py loops over all years in the DB and creates the summary tables even if the data weren't changed. I too have been wondering this behaviour, but since the it runs relatively quickly, never thought it as a significant problem.

j-paulus avatar May 28 '21 08:05 j-paulus

thanks for the tip, reinstalled sqlite3 package on test-VM and also installed GarminDB on my PROD machine - same behaviour. i also checked sysload during the analysis, there's no significant change in utilization, so the root cause (in my case) is not based on system resources.

giterest avatar May 28 '21 08:05 giterest

I invite you to make some profiling on garmin.py:

% python3 -m cProfile  -o garmin.prof garmin.py --analyze
...
% python3 -m pstats garmin.prof
Welcome to the profile statistics browser.
garmin.prof% strip
garmin.prof% sort tottime
garmin.prof% stats 20
Fri May 28 13:53:09 2021    garmin.prof

         74287412 function calls (70170806 primitive calls) in 126.881 seconds

   Ordered by: internal time
   List reduced from 4865 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    62994   20.344    0.000   20.344    0.000 {method 'execute' of 'sqlite3.Cursor' objects}
    60779    3.218    0.000    3.218    0.000 {method 'fetchall' of 'sqlite3.Cursor' objects}
1694112/62840    2.848    0.000   29.276    0.000 visitors.py:87(_compiler_dispatch)
  7371916    2.540    0.000    2.867    0.000 {built-in method builtins.isinstance}
   247170    1.929    0.000    4.834    0.000 elements.py:965(__init__)
...

lpierron avatar May 28 '21 12:05 lpierron

profiling runs.. will come back asap as i have some results to share.

update - the analyzing was quite fast as you can see ~11mins, compared to running make. i'll give make another try and see how long it takes to complete. keep you posted.

`Fri May 28 15:15:47 2021 garmin.prof

     73961169 function calls (70252069 primitive calls) in 658.288 seconds

Ordered by: internal time List reduced from 5282 to 20 due to restriction <20>

ncalls tottime percall cumtime percall filename:lineno(function) 2016617/126179 59.489 0.000 105.482 0.001 traversals.py:98(_gen_cache_key) 126853 57.014 0.000 57.014 0.000 {method 'execute' of 'sqlite3.Cursor' objects} 1774263/1773855 33.233 0.000 103.321 0.000 coercions.py:112(expect) 122433 16.597 0.000 16.597 0.000 {method 'fetchall' of 'sqlite3.Cursor' objects} 6207393 12.282 0.000 15.338 0.000 {built-in method builtins.isinstance} 471245 11.689 0.000 48.575 0.000 elements.py:1144(init) 487 11.086 0.023 11.086 0.023 {method 'commit' of 'sqlite3.Connection' objects} 490861 10.933 0.000 14.802 0.000 elements.py:1543(_gen_cache_key) 126809 10.012 0.000 29.676 0.000 default.py:921(_init_compiled) 480317 8.849 0.000 32.852 0.000 elements.py:5114(safe_construct) 122394 8.819 0.000 23.389 0.000 loading.py:39(instances) 340025 8.597 0.000 15.047 0.000 elements.py:3700(init) 122410 8.432 0.000 315.097 0.003 session.py:1537(execute) 629525/263552 7.428 0.000 79.002 0.000 traversals.py:202() 2793314 7.415 0.000 7.472 0.000 {method 'get' of 'dict' objects} 126809 6.978 0.000 242.229 0.002 base.py:1403(_execute_clauseelement) 105437 6.929 0.000 312.803 0.003 query.py:2778(one) 126848 6.538 0.000 105.077 0.001 base.py:1645(_execute_context) 478694 6.498 0.000 6.500 0.000 {method 'sub' of 're.Pattern' objects} 237437 6.425 0.000 87.872 0.000 functions.py:785(call)`

giterest avatar May 28 '21 13:05 giterest

I misunderstood your issue when I said it only does the latest. Download and import take a --latest option, but analyze always analyzes all of the DB data. I've never seen a need to optimize it since it is fast for me. Analyze reads and writes the database a lot. I would guess that your database access is slow. The database itself is a file, so I would guess either your VM doesn't have enough RAM to cache the DB file or your VMs IO is slow.

tcgoetz avatar May 28 '21 14:05 tcgoetz

hi all, so it seems that on TEST the sqlite-DB got somehow corrupted, causing garmindb to have that immense I/O times. anyhow, on PROD it runs fine (~12-15mins), after copying PROD-db to TEST, it also takes 12-15mins there per "make".

only thing "left" would be to discuss if ALL db data has to be analyzed always(?)

thanks for your kind support and time!

giterest avatar May 31 '21 08:05 giterest