datapusher
datapusher copied to clipboard
Prevent DataPusher from Uploading Resources with More than X Columns
Hi everyone. We have an open portal featuring the Datapusher. People can create their own accounts and create their own datasets and resources. Most of things work fine except for the fact that some users have huge tabular files. Some of these files have thousands of columns. Knowinf that there are some limitations on PostgreSQL concerning the amount of bearable columns, we would like to prevent the DataPusher to try to push a resource, if it happens to be that this very resource has more than X amount of columns.
Indeed, currently, the DataPusher ust tries to send the files, no matter their size. It will of course end up being rejected due to PostgreSQL limitations, but in the meantime, it will eatup server resources (RAM, CPU, SWAP) for nothing. And some of our users have hundreds of files like this and they have automated the uploading process through API. So when they are actually sending new resources, our server is almost always dying.
Is there a way to indicate to the DataPusher not to push a resource, in case it has more than X amount of columns?
CKAN Version : 2.4.1 Datapusher Version : 0.0.6
Hi @kimepe can you please provide me sample dataset as I am not able to find a dataset with 1000+ columns.
Hi,
Here's an example: https://www.donneesquebec.ca/recherche/dataset/rapport-financier-2021/resource/4072be56-1035-4b68-86d9-6589521d9e90
Most DataSources under the label "Rapport Financier" (https://www.donneesquebec.ca/recherche/organization/affaires-municipales-et-occupation-du-territoire) fall under this category.
Peace,
Le 2022-07-27 à 23 h 47, Deep-NEC a écrit :
Hi @kimepe https://github.com/kimepe can you please provide me sample data as i am not able to find a dataset with 1000+ columns on the internet.
— Reply to this email directly, view it on GitHub https://github.com/ckan/datapusher/issues/178#issuecomment-1197622533, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZM5E44V6JWWB3BRGASPRTVWH7GHANCNFSM4G2TKXGA. You are receiving this because you were mentioned.Message ID: @.***>