TagStudio icon indicating copy to clipboard operation
TagStudio copied to clipboard

Feature/boolean search syntax

Open samuellieberman opened this issue 1 year ago • 9 comments

Adds Boolean search syntax to the library search bar. For an explanation of how the search syntax works, check out the new "Search Cheat-Sheet" section in README.md.

Some existing features were changed. Tags with spaces now require underscores when being used in searches. Special consideration was made for the upcoming SQLite migration to hopefully reduce necessary refactoring.

I am happy to discuss these details or any other ones if there are any questions or concerns.

samuellieberman avatar Jul 05 '24 07:07 samuellieberman

The search feature is really cool but i tested it on a bigger library (about 2.2 million entries) and it makes searching basically unusable, because it took 21 min to search for an entry with a tag. This previously took 1.71s so if you could somehow improve the Performance it would be really nice.

Thesacraft avatar Jul 06 '24 00:07 Thesacraft

The search feature is really cool but i tested it on a bigger library (about 2.2 million entries) and it makes searching basically unusable, because it took 21 min to search for an entry with a tag. This previously took 1.71s so if you could somehow improve the Performance it would be really nice.

Okay, I definitely wasn't expecting that. At it's very worst the syntax parsing should only increase the runtime by a small percentage. I wouldn't have been surprised if it decreased the runtime even.

But I was able to reproduce a 6x slowdown even with only about 1.4k entries, so something has definitely gone wrong. I will try to do some profiling to figure out what is causing this. Thanks.

samuellieberman avatar Jul 06 '24 02:07 samuellieberman

The search feature is really cool but i tested it on a bigger library (about 2.2 million entries) and it makes searching basically unusable, because it took 21 min to search for an entry with a tag. This previously took 1.71s so if you could somehow improve the Performance it would be really nice.

Okay, I believe I figured out what was causing the slowdown. Do you mind if I ask what you were searching for in your library? I suggest going back to the original TagStudioDev:main branch and trying a search for "missing" or "no file". If my suspicions are correct, then I bet those searches would take about 20 minutes to complete, even in the official project.

The reason for this is that Library.search_library() contains a costly .resolve() operation in order to check an entry's filepath against Library.missing_files. The worst part is that Library tracking missing or displaced files seems to have been broken and unimplemented in for TagStudio a while. The Library.missing_files attribute stays empty, even when I delete files and refresh the library and directories.

My fix is to remove the unimplemented feature from searches for now. I left the rest of the Library.missing_files related code untouched. Please let me know if this resolves your slowdown @Thesacraft. Thanks.

samuellieberman avatar Jul 06 '24 17:07 samuellieberman

I searched for a Tag and had one file that was tagged with the tag. I tested it and it made it much faster (34s) but i think if it could be improved it would be really great. And the current search in the main is still much faster with 1.71s so it would be realy nice if you could improve it to like 2-10s. Here is the search and the found file: grafik

Thesacraft avatar Jul 06 '24 17:07 Thesacraft

Wasn't the boolean search started in this PR? https://github.com/TagStudioDev/TagStudio/pull/284

Also, as far as I know, CyanVoxel didn't decide yet how to handle tags with multiple words - https://github.com/TagStudioDev/TagStudio/issues/112

Qronikarz avatar Jul 06 '24 21:07 Qronikarz

I searched for a Tag and had one file that was tagged with the tag. I tested it and it made it much faster (34s) but i think if it could be improved it would be really great. And the current search in the main is still much faster with 1.71s so it would be realy nice if you could improve it to like 2-10s. Here is the search and the found file: grafik

Thank you for trying my project and sharing your searches with me @Thesacraft. I wanted to understand your situation better, so since my last post I have created my own library with 1 million items and performed lots of different searches. Now I have some search results I want to share with you! Have you tried searching for "missing" in the main project like I suggested earlier? I tried that search with the main project, and it took a very similar amount of time to regular searches in my PR before I pushed your performance update. I also wanted to replicate and experiment with your latest result, and I found that I could also get the main project to produce similar times to after my performance update. I suggest also searching "filename: ?" in the main project if you want to replicate this second search for yourself.

I have since pushed another smaller performance update for you to try. I think this will give you another boost to performance, but so far the performance improvements of my updates are all for search dependent reasons, just like in the main project. Right now, the only improvements left that I see involve making the code harder to understand and modify, or starting to sacrifice boolean search syntax entirely. I'd have looked into some of this earlier if I didn't expect non-syntax search performance to change entirely if the project switches to using an SQL database. Non-syntax search performance isn't something I've considered much, so I'm happy to listen to any other performance concerns with my PR.

Where does the syntax come from? If it were up to me, I wouldn't trust a PR without prior convincing me in the issues that the query language is good enough for the target audience to understand and use.

I sense inspiration from some boorus, but I can't say it's a good thing.

Provided readme section seems insufficient for me to grasp it even if project owners consider the syntax good enough to adopt. I can't get through the dense explanation with my ADHD. And slightly lesser issue is improper use of markdown.

Thank you for reading the documentation I wrote for my search syntax and thank you for posting your reply @KillyMXI. I don't want you to feel pressure to trust anything or to be convinced of anything. When I started this feature, I designed it for myself and based on my own needs and preferences. There wasn't a CONTRIBUTING.md and I hadn't even decided I wanted to contribute my code to the main project.

You are discerning to correctly realize that I took inspiration from online image boorus, but my personal search preferences are just a starting point. Now that the feature is working, I have no issue at all with changing the syntax based on feedback. I just see this as an opportunity to help a project that I have already benefited from.

I would also love to collaborate to improve my README cheat-sheet section. Thanks again for your consideration into my syntax and explanation. Does any of this sound like something you would want to help improve?

samuellieberman avatar Jul 07 '24 20:07 samuellieberman

Thank you for trying my project and sharing your searches with me @Thesacraft. I wanted to understand your situation better, so since my last post I have created my own library with 1 million items and performed lots of different searches. Now I have some search results I want to share with you! Have you tried searching for "missing" in the main project like I suggested earlier? I tried that search with the main project, and it took a very similar amount of time to regular searches in my PR before I pushed your performance update. I also wanted to replicate and experiment with your latest result, and I found that I could also get the main project to produce similar times to after my performance update. I suggest also searching "filename: ?" in the main project if you want to replicate this second search for yourself.

I have since pushed another smaller performance update for you to try. I think this will give you another boost to performance, but so far the performance improvements of my updates are all for search dependent reasons, just like in the main project. Right now, the only improvements left that I see involve making the code harder to understand and modify, or starting to sacrifice boolean search syntax entirely. I'd have looked into some of this earlier if I didn't expect non-syntax search performance to change entirely if the project switches to using an SQL database. Non-syntax search performance isn't something I've considered much, so I'm happy to listen to any other performance concerns with my PR.

I just tested the searches you suggested and it seems like you are completle right. But with the new peformance improvment a simple Tag search is down to about 5.4s in my library which is pretty good. So i think you shouldn't really sacrifice the boolean syntax to get it faster. I think a better solution is to maybe add a loading bar or some visual indicator (in bigger librarys) and maybe detach the search from the main thread to stop it from blocking the rendering of the application. But this is probably out of scope for this pr and could be added in the future. I really appreciate the work you put in to make it more peformant.

Thesacraft avatar Jul 07 '24 20:07 Thesacraft

Thank you for the work put into this PR, but unfortunately due to the significant library backend changes in https://github.com/TagStudioDev/TagStudio/pull/332 I'll have to close this as it's no longer compatible. I apologize for the long turnaround time and and lack of input on my side about this. This discussions in this PR have greatly helped with the future direction of this feature.

https://github.com/TagStudioDev/TagStudio/pull/606 is currently taking up the task of implementing boolean syntax search.

CyanVoxel avatar Dec 05 '24 07:12 CyanVoxel