scancode.io
scancode.io copied to clipboard
populate_purldb_with_detected_purls steps takes too long
I have about 100,000 packages in a scan:
2023-11-26 14:51:19.56 Step [populate_purldb_with_detected_purls] starting
2023-11-26 14:51:27.33 Populating PurlDB with 93,086 detected PURLs
2023-11-26 15:03:36.83 Progress: 10% (94/931) ETA: 6565 seconds (1.8 hours)
2023-11-26 15:09:20.44 Progress: 20% (187/931) ETA: 4292 seconds (1.2 hours)
2023-11-26 15:17:16.19 Progress: 30% (280/931) ETA: 3614 seconds (1.0 hours)
2023-11-26 15:27:40.87 Progress: 40% (373/931) ETA: 3260 seconds (54.3 minutes)
2023-11-26 16:14:08.66 Progress: 50% (466/931) ETA: 4961 seconds (1.4 hours)
2023-11-26 16:58:34.71 Progress: 60% (559/931) ETA: 5085 seconds (1.4 hours)
2023-11-26 17:44:22.35 Progress: 70% (652/931) ETA: 4446 seconds (1.2 hours)
2023-11-26 18:29:32.21 Progress: 80% (745/931) ETA: 3271 seconds (54.5 minutes)
2023-11-26 19:06:03.83 Progress: 90% (838/931) ETA: 1697 seconds (28.3 minutes)
If each PURL is about 200 character long, this is about 20MB of data. I would expect to have a call to the PurlDB that would be fast enough and would return as soon as the 100K PURLs have been uploaded and would then run in the background.