duplicate-images
                                
                                
                                
                                    duplicate-images copied to clipboard
                            
                            
                            
                        Remove MongoDB dependency
Is MongoDB really necessary? I should look into removing it as a requirement.
MongoDB may not be, I have something similar to this project (very hackish and system specific) with 342,063 hashes that I moved into a MySQL database for speed.
Why not use SQLite?
In my use case, the MySQL database system resided on another box with SSDs while the system with the images is on standard drives.
I like the idea of SQLite. That way there is no dependencies on an external database. I think this will make the script much easier to use. I don't know much about databases, but I would expect the performance to be not great compared to something like MySQL.
I always use SQLite in my projects, in various languages. It's extremely portable, require just a library to work (in Python, need a pip install sqlite3, and it's ready!) easy to backup/restore and, most important, fast, not more than MySQL but neither less. I think that you can give a try to it!
+1 for using sqlite.
I agree, SQLite seems to be the best approach. Hopefully I will get to this in the next couple of weeks, but if someone wants to get it done before that, feel free.
Any progress on moving to SQLite?
I'd love to contribute by changing from MongoDB to the builtin sqlite3 - want me to take a crack at it?
@do-hickey I would love some help. I'm close to having support, but I won't have much time to work on it until April.
I've decided to restructure the code quite a bit so that it can support many different databases. I've pushed my current work in progress as v1.0 branch. It currently supports MongoDB, SQLite, and TinyDB. I have very little experience with SQLite so it would be great to get some feedback on if I structured the schema efficiently. Also, be warned, I haven't finished re-implementing displaying duplicates (in particular, display_duplicates function) in the web browser so this version isn't fully functional yet.