BarcodeRattler A simple fix for slow CSV searching

I noticed in Neil's demo on YT that the time between scanning seemed to be pretty slow and highly variable. After he mentioned the barcodes were being stored in CSV, I rightly assumed that the CSV was being parsed and searched through every time a code is scanned. This is super inefficient, and only provides one small advantage in that simply uploading a new csv will immediately make new titles available to scan.

This PR switches things around to only load the CSV once when the script loads, and then does a list comprehension on it to build an index by barcode. I haven't tested it (I don't have a barcode scanner), but it should allow for lightning fast lookups. There are only three caveats to this approach:

All of the barcodes and other data in the CSV will be stored in memory. As long as the number of columns in the CSV doesn't get ridiculous, I can't see this being an actual problem on a RPi.
Each row in the CSV will need a unique barcode. Really, this was already the case as the current script will only ever return the data for the first barcode it finds. After this change, if there are duplicates I believe you'd only get the data for the LAST duplicate in the file.
The script will need to be restarted whenever the CSV is updated.

EDIT: I didn't notice a script had also been added for NFCs. The same change could easily be made to that script as well.

Jan 13 '22 17:01 raelik

There are several methods to handle CSV updates, from using inotify to just doing a sys.stat() on the file to see if the mtime changed since the script was started, and then only reload it in this case.

Jan 15 '22 00:01 mmuman

Lookups I guess would take just a few ms at most, network, however... I would assume the culprit is the function mms(rs) establishing an SSH connection from scratch every time. The SSH handshake takes a few seconds with such low power devices indeed.

Jan 15 '22 01:01 vladkorotnev

@mmuman True, didn't think about inotify!

@vladkorotnev That's a fair point, re-establishing the ssh connection each time is almost certainly making whatever slight delay from reading the (relatively small) CSV and doing a linear search over it look like statistical noise :D That should also be a fairly solvable problem though. Similar to what I've done with the CSV, the ssh connection could be established and closed outside the main loop. It would probably be necessary to specify the TCPKeepAlive and ServerAliveInterval SSH options when calling pxssh.pxssh(), as well as handle reconnection with the except block in the mms function.

Jan 15 '22 04:01 raelik

Thanks. Yeah it is all pretty hackery at the moment, just to get something working and i am not a python developer. Google and Stackoverflow were my friends 😂.

I defo know it can be slow and as the CSV file gets larger it would get slower. Neil likes just updating the CSV file without having to restart anything. So i would need to look into inotify and also the ssh connection and split the code up into library files.

Jan 15 '22 09:01 chris-jh

BarcodeRattler BarcodeRattler copied to clipboard

A simple fix for slow CSV searching

BarcodeRattler
BarcodeRattler copied to clipboard