datapusher-plus
datapusher-plus copied to clipboard
Upload to Datastore errors when DOWNLOAD_PREVIEW_ONLY=True
Describe the bug
When using datapusher-plus-docker to run datapusher-plus, with the following config parameters set:
PREVIEW_ROWS=1000 ADD_SUMMARY_STATS_RESOURCE=True SUMMARY_STATS_WITH_PREVIEW=True DOWNLOAD_PREVIEW_ONLY=True
There appears to be a problem with having DOWNLOAD_PREVIEW_ONLY=True
Setting DOWNLOAD_PREVIEW_ONLY=False fixes the errors I'm seeing.
With DOWNLOAD_PREVIEW_ONLY=True, when I try to push a resource to DP+, I get errors.
These are my test resource files that are failing:
When I push the attached XLSX file, I get the this error:
datapusher-plus | --- Logging error ---
datapusher-plus | Traceback (most recent call last):
datapusher-plus | File "/srv/app/src/datapusher-plus/datapusher/jobs.py", line 630, in push_to_datastore
datapusher-plus | qsv_excel = subprocess.run(
datapusher-plus | File "/usr/lib/python3.10/subprocess.py", line 524, in run
datapusher-plus | raise CalledProcessError(retcode, process.args,
datapusher-plus | subprocess.CalledProcessError: Command '['/usr/local/bin/qsvdp', 'excel', '/tmp/tmp8s4qgo7c.XLSX', '--sheet', '
0', '--trim', '--output', '/tmp/tmp7ns3tj6h.csv']' returned non-zero exit status 1. datapusher-plus |
datapusher-plus | During handling of the above exception, another exception occurred:
datapusher-plus |
datapusher-plus | Traceback (most recent call last):
datapusher-plus | File "/usr/lib/python3.10/logging/handlers.py", line 1057, in emit datapusher-plus | smtp = smtplib.SMTP(self.mailhost, port, timeout=self.timeout)
datapusher-plus | File "/usr/lib/python3.10/smtplib.py", line 255, in __init__
datapusher-plus | (code, msg) = self.connect(host, port)
datapusher-plus | File "/usr/lib/python3.10/smtplib.py", line 341, in connect
datapusher-plus | self.sock = self._get_socket(host, port, self.timeout)
datapusher-plus | File "/usr/lib/python3.10/smtplib.py", line 312, in _get_socket
datapusher-plus | return socket.create_connection((host, port), timeout,
datapusher-plus | File "/usr/lib/python3.10/socket.py", line 845, in create_connection
datapusher-plus | raise err
datapusher-plus | File "/usr/lib/python3.10/socket.py", line 833, in create_connection
datapusher-plus | sock.connect(sa)
datapusher-plus | ConnectionRefusedError: [Errno 111] Connection refused
datapusher-plus | Call stack:
datapusher-plus | File "/usr/lib/python3.10/threading.py", line 973, in _bootstrap
datapusher-plus | self._bootstrap_inner()
datapusher-plus | File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
datapusher-plus | self.run()
datapusher-plus | File "/usr/lib/python3.10/threading.py", line 953, in run
datapusher-plus | self._target(*self._args, **self._kwargs)
datapusher-plus | File "/usr/lib/python3.10/concurrent/futures/thread.py", line 83, in _worker
datapusher-plus | work_item.run()
datapusher-plus | File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
datapusher-plus | result = self.fn(*self.args, **self.kwargs)
datapusher-plus | File "/usr/lib/ckan/dpplus_venv/lib/python3.10/site-packages/apscheduler/executors/base.py", line 125, in run
_job
datapusher-plus | retval = job.func(*job.args, **job.kwargs)
datapusher-plus | File "/srv/app/src/datapusher-plus/datapusher/jobs.py", line 646, in push_to_datastore
datapusher-plus | logger.error(
datapusher-plus | Message: "Upload aborted. Cannot export spreadsheet(?) to CSV: Command '['/usr/local/bin/qsvdp', 'excel', '/tmp
/tmp8s4qgo7c.XLSX', '--sheet', '0', '--trim', '--output', '/tmp/tmp7ns3tj6h.csv']' returned non-zero exit status 1."
datapusher-plus | Arguments: ()
datapusher-plus | 2023-06-26 16:53:33,176 WARNING Is the file encrypted or is not a spreadsheet?
datapusher-plus | FILE ATTRIBUTES: /tmp/tmp8s4qgo7c.XLSX: Microsoft Excel 2007+
When I try with the same XLSX file converted to a CSV, I get the following error:
datapusher-plus | 2023-06-26 17:02:57,076 INFO Fetching from: http://192.168.7.200:5000/dataset/8cbdffdb-1cef-4c9d-84fd-005fde129
962/resource/9af29c46-4f37-4c8f-9021-09bf7af88f9b/download/tceq-test.csv...
datapusher-plus | 127.0.0.1 - - [26/Jun/2023:17:02:57 +0000] "GET /job/3b1c2e8d-29de-4d65-87b8-e3d800129cfe HTTP/1.1" 200 1111 "-
" "python-requests/2.25.1"
datapusher-plus | 2023-06-26 17:02:57,161 INFO Downloading only first 1,000 row preview from 5.31MB file...
datapusher-plus | 2023-06-26 17:02:57,170 INFO Fetched 0.09MB file in 0.09 seconds.
datapusher-plus | 2023-06-26 17:02:57,177 INFO ANALYZING WITH QSV..
datapusher-plus | 2023-06-26 17:02:57,184 INFO Normalizing/UTF-8 transcoding CSV...
datapusher-plus | Invalid CSV. Last valid row (4): CSV error: record 4 (line: 5, byte: 446): found record with 23 fields, but the
previous record has 3 fields
datapusher-plus | 2023-06-26 17:02:57,237 ERROR Job aborted as the file cannot be normalized/transcoded: Command '['/usr/local/bi
n/qsvdp', 'input', '/tmp/tmpso60e8jy..csv', '--trim-headers', '--output', '/tmp/tmp3zaav2od.csv']' returned non-zero exit status 1
..
datapusher-plus | --- Logging error ---
datapusher-plus | Traceback (most recent call last):
datapusher-plus | File "/srv/app/src/datapusher-plus/datapusher/jobs.py", line 692, in push_to_datastore
datapusher-plus | subprocess.run(
datapusher-plus | File "/usr/lib/python3.10/subprocess.py", line 524, in run
datapusher-plus | raise CalledProcessError(retcode, process.args,
datapusher-plus | subprocess.CalledProcessError: Command '['/usr/local/bin/qsvdp', 'input', '/tmp/tmpso60e8jy..csv', '--trim-head
ers', '--output', '/tmp/tmp3zaav2od.csv']' returned non-zero exit status 1.
datapusher-plus |
datapusher-plus | During handling of the above exception, another exception occurred:
datapusher-plus |
datapusher-plus | Traceback (most recent call last):
datapusher-plus | File "/usr/lib/python3.10/logging/handlers.py", line 1057, in emit
datapusher-plus | smtp = smtplib.SMTP(self.mailhost, port, timeout=self.timeout)
datapusher-plus | File "/usr/lib/python3.10/smtplib.py", line 255, in __init__
datapusher-plus | (code, msg) = self.connect(host, port)
datapusher-plus | File "/usr/lib/python3.10/smtplib.py", line 341, in connect
datapusher-plus | self.sock = self._get_socket(host, port, self.timeout)
datapusher-plus | File "/usr/lib/python3.10/smtplib.py", line 312, in _get_socket
datapusher-plus | return socket.create_connection((host, port), timeout,
datapusher-plus | File "/usr/lib/python3.10/socket.py", line 845, in create_connection
datapusher-plus | raise err
datapusher-plus | File "/usr/lib/python3.10/socket.py", line 833, in create_connection
datapusher-plus | sock.connect(sa)
datapusher-plus | ConnectionRefusedError: [Errno 111] Connection refused
datapusher-plus | Call stack:
datapusher-plus | File "/usr/lib/python3.10/threading.py", line 973, in _bootstrap
datapusher-plus | self._bootstrap_inner()
datapusher-plus | File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
datapusher-plus | self.run()
datapusher-plus | File "/usr/lib/python3.10/threading.py", line 953, in run
datapusher-plus | self._target(*self._args, **self._kwargs)
datapusher-plus | File "/usr/lib/python3.10/concurrent/futures/thread.py", line 83, in _worker
datapusher-plus | work_item.run()
datapusher-plus | File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
datapusher-plus | result = self.fn(*self.args, **self.kwargs)
datapusher-plus | File "/usr/lib/ckan/dpplus_venv/lib/python3.10/site-packages/apscheduler/executors/base.py", line 125, in run
_job
datapusher-plus | retval = job.func(*job.args, **job.kwargs)
datapusher-plus | File "/srv/app/src/datapusher-plus/datapusher/jobs.py", line 706, in push_to_datastore
datapusher-plus | logger.error(
datapusher-plus | Message: "Job aborted as the file cannot be normalized/transcoded: Command '['/usr/local/bin/qsvdp', 'input', '
/tmp/tmpso60e8jy..csv', '--trim-headers', '--output', '/tmp/tmp3zaav2od.csv']' returned non-zero exit status 1.."
datapusher-plus | Arguments: ()
datapusher-plus | 127.0.0.1 - - [26/Jun/2023:17:03:01 +0000] "GET /job/3b1c2e8d-29de-4d65-87b8-e3d800129cfe HTTP/1.1" 200 2217 "-
" "python-requests/2.25.1"