dsp-api
dsp-api copied to clipboard
Error on bulk import of images
Hello Huston, we have a problem here in Lausanne with a bulk import containing about 100 images. We don't really know what images can cause the failed and if the problem come from Knora or Sipi.
The fact is the bulk import failed at the very begin reporting this error : {"status":4,"error":"org.knora.webapi.TriplestoreConnectionException: Connection to Sipi timed out after 5 seconds"} while (if we trust the Sipi logs) Sipi continues to treat image files ...
We use the doker-compose dev stack from Knora project (with a minor change to use a Knora image: dhlabbasel/webapi:v2.1.0 just to speedup restart process)
To reproduce the problem:
- Download all images here : https://drive.switch.ch/index.php/s/NTyvlQXg6qRwlzf
- Unzip somewhere : /somewhere/Reforme-geneve-archive
- Use this script to launch our test (https://github.com/LaDHUL/BulkImageImportTest)
./checkImageFolder.sh /somewhere/Reforme-geneve-archive /sipi-import/Reforme-geneve-archive
By considering that Sipi container can see this folder :
sipi:
[...]
volumes:
- /somewhere:/sipi-import
[...]
Can you try increasing app.sipi.timeout in Knora's application.conf?
Oh yeah ! Thanks Houston, we get back control of the satellite :)
I can pass the stress test by setting a timeout of 60 seconds (and sometimes more depending on cpu/memory load ?) mainly because of the last full bulk import of 103 images. The beginning of the test (103 bulk imports of 1 image) can work with a timeout of 25 seconds.
I would really like to know why Sipi would take 60 seconds to process an image, but it could take some time to investigate this. Is increasing the timeout OK as a workaround for now?
I think that the current design of the bulkimporter creates 100 new resources at the same time, which basically hammers Knora and Sipi. I think that the bulkimporter needs a redesign.
@benjamingeer Could we have a quick Skype call tomorrow regarding this?
Is increasing the timeout OK as a workaround for now?
yes for my personal test, but this is a @mrivoal's project, not sure it will work on her smaller laptop ? We will try, I will be in Lausanne tomorrow to help her to configure the timeout.
Note that at the beginning we thought that it was an image format error, as @loicjaouen is used to have. I can see in the Sipi logs many errors/warnings that could be reported to the end user, imho it is very important especially since Sipi lost the original data, it is related to #1050.
I think that the current design of the bulkimporter creates 100 new resources at the same time, which basically hammers Knora and Sipi.
Creating all the resources at the same time isn't a problem for Knora, because they're created in one transaction (one SPARQL INSERT). This is necessary to satisfy the triplestore's consistency checks.
Looking quickly at the code, I think that ResourcesResponderV1 asynchronously asks SipiResponderV1 to convert the image files, so Sipi would have to process several of these requests concurrently. On the other hand, SipiResponderV1 blocks while processing each request, so Sipi shouldn't have to handle more than akka.actor.deployment./responderManager/sipiRouterV1.nr-of-instances concurrent requests from Knora.
I could talk tomorrow afternoon on Skype.
After talking with @subotic, we are going to redesign bulk import somewhat to reduce its resource consumption, but we won't be able to do it until next year.
Is increasing the timeout OK as a workaround for now?
If it works on our productive installation, it will be fine. However, if it doesn't, then we will need your help to find another way to make the bulk import work. Because it involves a project whose data needs to be stored and published in Knora before the end of 2018. We have a commitment to do so.
#1062 should lessen the load on the machine a bit during bulk import. Should be ready in a few days for you to try out.
Great :)
Trying Sipi on the command line with just one of your images:
euler:Sipi benjamingeer$ time local/bin/sipi -f HISTO_BILLON_1726.jpg HISTO_BILLON_1726.jp2
real 0m6.809s
The input file is a 44 MB JPG file. So the problem is simple: Sipi does in fact take more than five seconds to convert it to JPEG 2000. Is that a bug? I don't know.
#1062 has been merged.
With Knora's default app.sipi.timeout of 5 seconds, using the current develop branches of both Knora and Sipi (compiled from source, not running in Docker), when I run your checkImageFolder.sh script, I get timeout errors in these four files:
HISTO_BILLON_1726.jpgarch_tronchin_34_f57r.tiffchaeg_rc_29_f_112.tiffmhr_m_102_z1.png
These are all among the largest files in your test.
With a timeout of 10 seconds, all images upload successfully except for chaeg_rc_29_f_112.tiff, which is the largest file in the test (268 MB).
With a timeout of 15 seconds, all images import successfully.
In addition to the time needed to convert the image to JPEG 2000, Sipi needs time to compute the image's SHA256 hash, which is included in the embedded metadata of the converted JPEG 2000 file.
Is that a bug? I don't know.
For me, this configuration is very hardware dependent. So I would say that this is not a bug. It just takes its time to run. If the machine where super fast, then maybe the 5 seconds would have been sufficient. Maybe we should set the default to 15 seconds?
What we should fix though, is the error message: {"status":4,"error":"org.knora.webapi.TriplestoreConnectionException: Connection to Sipi timed out after 5 seconds"}
A timeout of Sipi should not raise a TriplestoreConnectionException but a SipiConnectionException.
TriplestoreConnectionException
Yes, that looks like a copy-and-paste error. :)