qiita
qiita copied to clipboard
unchunked archive/observations communication
Hi Qiita team, we just ran into the issue that our fresh Qiita instance did process a real world sized (~700 samples) 16S study. Since DB is fresh, no SEPP placements were stored. Unfortunately, the API call https://github.com/qiita-spots/qp-deblur/blob/efd59e3cd6ea176557633bbfd86eafd28072597a/qp_deblur/deblur.py#L506-L507 failed with error message:
Error executing Deblur 2021.09: ['Traceback (most recent call last):\n', ' File "/homes/sjanssen/bcf_qiita/envs/deblur/lib/python3.5/site-packages/qiita_client/plugin.py", line 266, in __call__\n qclient, job_id, job_info[\'parameters\'], output_dir)\n', ' File "/homes/sjanssen/bcf_qiita/envs/deblur/lib/python3.5/site-packages/qiita_client/plugin.py", line 105, in __call__\n return self.function(qclient, server_url, job_id, output_dir)\n', ' File "/homes/sjanssen/bcf_qiita/envs/deblur/lib/python3.5/site-packages/qp_deblur/deblur.py", line 507, in deblur\n path=job_id, value=json.dumps(new_placements))\n', ' File "/homes/sjanssen/bcf_qiita/envs/deblur/lib/python3.5/site-packages/qiita_client/qiita_client.py", line 470, in patch\n return self._request_retry(self._session.patch, url, **kwargs)\n', ' File "/homes/sjanssen/bcf_qiita/envs/deblur/lib/python3.5/site-packages/qiita_client/qiita_client.py", line 375, in _request_retry\n % (req.__name__, url, r.status_code, r.text))\n', "RuntimeError: Request 'patch https://qiita.jlab.bio/qiita_db/archive/observations/' did not succeed. Status code: 413. Message: <html>\r\n<head><title>413 Request Entity Too Large</title></head>\r\n<body>\r\n<center><h1>413 Request Entity Too Large</h1></center>\r\n<hr><center>nginx/1.25.3</center>\r\n</body>\r\n</html>\r\n\n"]
Which is due to a relatively small max_body_size=7M configuration for nginx. After increasing to 100M, it worked. The file size was ~22M.
I wonder if it is worth to implement chunking similarly to file upload to prevent issued with really huge data transfer?
We haven't had the need of implementing any chucking mechanism for those pages but we are able to control our configuration (nginx) so we can raise those values "easily" and limit entry to those endpoint. For example, we limit the client_max_body_size based on the request page and the specific request to nginx.
However, I think this is a combination of what you can do in your installation and personal preference.
Anyway, FWIW the client_max_body_size is the main qiita site (depending on the entry point) are: 300M, 600M, 1500M.
Closing for now, please reopen if you have further questions.