influxdb-python
influxdb-python copied to clipboard
Influx getting gradually slower at every consecutive query
If I try to query the SAME DAY of data over and over again, my query times look something like this: 1m:18s 1m:28s 1m:50s 2m:11s
and so on. This is an old date, so I'm no new data gets added in that time period. What could this be? Should I stop and restart InfluxDB at every query to minimize this?
Influx 1.7.9, Influx python client 5.2.3, Python 3.7, MacOs 10.12.6
@rbdm-qnt thanks for this. There are many things that could impact this, but to answer your specific question, no, you should not stop and restart influxdb after each query.
If you provide some additional information about the query, the database, the environment, or anything you can think of that might be impacting performance, we can start narrowing it down.
Hi @russorat Sorry for torturing with so many questions about Influx's speed recently, appreciate your patience. The application is the same as all my other issues, financial data, have to query entire rows, 7 fields per row. The environment is: Influx 1.7.9, Influx python client 5.2.3, Python 3.7, MacOs 10.12.6, 16GB Ram.
In the last week I've switched to an Apache Parquet database, and Dask. I have to say it's faster, but not by a huge margin, however it does handle Ram way more efficiently. I realized in those last few weeks that no file system is really designed to do what I do (which is query entire rows), they are all optimized to query columns and aggregate data inside the query itself, none of those is optimized for my brute-force style query. Unfortunately, I can't do otherwise. I'm also not on a very performing hardware to begin with. I'd probably need a cluster to do what I'm doing faster.
@rbdm-qnt not a problem! Sounds like processing this in the cloud might be your best option. Pretty easy to horizontally spin up a large number of executors to process your data if you need it in a specific timeframe.
How can I do this? I figured I'd need some sort of cluster or at least a dedicated VPS, right? Any link would be really useful