scancode.io icon indicating copy to clipboard operation
scancode.io copied to clipboard

populate_purldb_with_detected_purls steps takes too long

Open pombredanne opened this issue 1 year ago • 0 comments

I have about 100,000 packages in a scan:

2023-11-26 14:51:19.56 Step [populate_purldb_with_detected_purls] starting
2023-11-26 14:51:27.33 Populating PurlDB with 93,086 detected PURLs
2023-11-26 15:03:36.83 Progress: 10% (94/931) ETA: 6565 seconds (1.8 hours)
2023-11-26 15:09:20.44 Progress: 20% (187/931) ETA: 4292 seconds (1.2 hours)
2023-11-26 15:17:16.19 Progress: 30% (280/931) ETA: 3614 seconds (1.0 hours)
2023-11-26 15:27:40.87 Progress: 40% (373/931) ETA: 3260 seconds (54.3 minutes)
2023-11-26 16:14:08.66 Progress: 50% (466/931) ETA: 4961 seconds (1.4 hours)
2023-11-26 16:58:34.71 Progress: 60% (559/931) ETA: 5085 seconds (1.4 hours)
2023-11-26 17:44:22.35 Progress: 70% (652/931) ETA: 4446 seconds (1.2 hours)
2023-11-26 18:29:32.21 Progress: 80% (745/931) ETA: 3271 seconds (54.5 minutes)
2023-11-26 19:06:03.83 Progress: 90% (838/931) ETA: 1697 seconds (28.3 minutes)

If each PURL is about 200 character long, this is about 20MB of data. I would expect to have a call to the PurlDB that would be fast enough and would return as soon as the 100K PURLs have been uploaded and would then run in the background.

pombredanne avatar Nov 26 '23 19:11 pombredanne