vuegraf icon indicating copy to clipboard operation
vuegraf copied to clipboard

checkpointing data extraction + more

Open chase1124 opened this issue 7 months ago • 3 comments

Hi,

As mentioned in the discussions, I am interested in a "batch" feature that would allow vuegraf to be run asynchronously without losing data or requiring user logic with --history to catch up. To accomplish this I have created a --batch command line option which does that on a per channel and per time precision basis (for each channel, example GID 123456-1, there is a separate checkpoint for Day, Hour, Minute). It knows the maximum data retention on the API and also the maximum request windows and will massage any batch requests as needed. I would also like to suggest some additional improvements (my opinion) like a "detailedDatapoint" for influxdb which has the channel type identifier (the label of circuit type) to enable more automated graphing of circuits, channel multiplier usage, etc, as well as some smaller modifications like gzipping the influx writes (better network performance and keep from influxdb write failures which i encountered using the --history CLI option) as well as using second precision on writes instead of the default nanosecond as there is no need (this is recommend by influx for better db performance and compression). Other small changes which we could discuss which I believe prevents any duplicate data between listUsage and chartUsage. All these things are done by the batch process I have created so would be easy to apply to the listUsage where desired (see below). I think the --history function could be effectively dropped as --batch is much more powerful while essentially doing the same thing (you can give it a CLI arg to override the checkpoints / initially seed the db by going back X days)

the listUsage normal operation could easily be combined with the checkpointing so that if you stop running vuegraf for sometime and then start it up with the --batch option it will catchup and then drop into listUsage collection again. I think this could be universally useful for data loss prevention and probably solves whatever problems were in the other issue about data loss as I believe the main cause of this is that the pyemvue library / emporia API sometimes only returns devices and not devices and their channels which will cause small or large data loss depending on the precision / function being executed (why I found per device and per precision checkpointing to be necessary).

I'm still testing but wondered how we can best collaborate? As I mentioned I'm no developer so I'm not familiar with practical details of modern development on commits. I guess you need to give me access to do a pull request to a development branch/tag? I just have one giant commit on a new feature branch -- some help on this part would be appreciated if you are interested in pulling the code.

chase1124 avatar Feb 02 '24 14:02 chase1124