crawler
crawler copied to clipboard
Crawler builds queue on 10.4 project, but doesn't process queue
Hello, I'm trying to get crawler running on a 10.4 project sinces days - with no success. I'm always facing the same problem: I can produce (by "build" task in scheduler) records in tx_crawler_queue. When I want to process them (via "run" scheduler task), something happens, for example result_data and process_id is filled - but process_id_completed stays empty and nothing is written in the index_x tables, for example index_rel. What I did:
- indexed_search and crawler are installed.
- In indexed_search "Debug mode" ist enabled and "Disable Indexing in Frontend" is set to true. "Use "crawler" extension to index..." can be true or not, the result is the same.
- In crawler I did not change the default configuration except filling the "Name of php binary" field (which seems to be correct - I don't get an error message in Info-SiteCrawler-CrawlingProzesse) and aktivating "debug.processDebug".
- I created a Crawler Configuration Record in root page which is used for the build task. There ist nothing special - I use "Re-indexing tx_indexedsearch_reindex" and in "Pids only" I wrote only the startpage for test. If would be very happy if someone has an idea - I'm running out of ideas now feeling I tested everything that could be important. Also I can exclude a special configuration in this project - as I tried everything on two different 10.4 projects hosted by different providers - the result is the same. Thanks, CB
Hi @Typine3
Sounds strange to me.
Which version of the Crawler are you using? Are you installing you TYPO3 with composer? Which PHP version? Which MySQL/MariaDB Version?
crawler 9.2.5 No Composer php7.3LATEST MySQL 5.6.19 Thanks for answering so fast!
Thanks for the additional info. Will see if I can get to reproduce this next week.
As the Crawler works without indexed_search I tend to say it's a misconfiguration of the indexed_search. Could you perhaps check if there are any pointers here to help you out? https://t3terminal.com/blog/typo3-indexed-search/
A good way to prove if indexed_search is working as intended is to enable frontend indexing. If that works, we are a little closer to locating the problem.
Well, in both projects indexed_search worked before I disabled frontend indexing! I had to TRUNCATE the index_x tables for testing, which were full before...I'll have another look at the t3terminal-source you mentioned which I already know...Thanks for looking at my problem! If I find something out meanwhile, I'll tell you. It would be so cool if we could solve this problem and I could use crawler for an indexation at night.
So, in one of the projects I just enabled frontend indexing. index_rel, which was empty before, contained sentences as soon as I had done some actions in Frontend!
The frontend index will fill the index as soon as someone visits the page. But as you can never be sure every page is visited, it's and advantage to do it with the crawler.
I'll be happy to help you futher, I see the problem in my test setup too, as you described, but I've limited knowledge on indexed-search so don't know where to start.
If the frontend index is on and the crawler runs, it should fill the index too, as it's basically only a frontend request.
If you don't find a solution, I will happily try to help out more, and will look more into this myself too.
Hi again @Typine3,
I got a little further. Don't think it's an indexed_search configuration problem, but a Crawler bootstrapping/class-loading problem. When I use my crawler devbox which is installed with composer, then it works, but the one without composer doesn't work.
This sounds like a bootstrapping / Class-loading problem, but where I don't know yet. Will need to get into debug mode to figure out more about that.
Thank you for working on this issue! That means that really in every non composer installation crawler will not work for the moment? I would be very happy if this could be solved - in the project where I want to apply it I cannot use frontend indexing and therefore have to do indexing by crawler at night...
It doesn't work in combination with indexed_search, correct. The crawler itself for e.g. Cache warm up is working.
Yes...Is there a chance to get this solved?
Nothing is impossible. But don't know the solution yet. @infabo has suggested adding the indexed_search to the ext_emconf.php suggest array, to ensure load order.
Have tried it really quick, but didn't work at first. Need to get the debugger started and checking it out. Having holiday right now, so not that much in front of my laptop :)
Just a shot from the hip: after adding it tonthe suggest array, you need to deactivate and activate the extension in extensionmanager to get orders applied.
It's really nice that you guys care about this problem even in your holiday! So there's still some hope for me :)
I just tried it - changed crawler's ext_emconf.php (see attached file), deactivated and activated indexed_search and crawler, flushed cache - and the problem is still the same...

Thanks for letting me know.
I shyly wanted to ask if there are any new findings on this topic...
Sorry, not yet.
OK, thanks for answering...
Tomorrow afternoon is my Crawler day, so will look into again tomorrow, hopefully I'll come closer to an answer.
Oh, then have a nice crawler afternoon! :) And tell me if I can help something, e.g. try something on my two different 10.4 installations...
I just release a new version of the Crawler (9.2.6). I would be surprised if it fixes this problem, but it fixes another problem with non-composer projects.
OK, I'll try tomorrow! Thanks and nice evening!
I did test i now on one project with crawler 9.2.6 - and nothing changed. As you supposed this didn't fix the problem...
Sorry to hear, would also have been a funny coincidence. Thanks for taking your time to test anyway.
I'll look more into this as soon as I find some time.
Thanks...I'm looking forward to your findings :)
Still no proceedings? I'll have to find another solution then.
Unfortunately not. I haven't found any solution for the Problem.
I'll update here as soon as I find a solution. But as this is not my full time work: I cannot promise anything.
As both Crawler 9.x.y and 10.x.y are compatible with TYPO3 10, would you mind trying to update the crawler to see if the issue persists?
Can this be related to #851 where the issue is the "baseURL" not been taking from the Sites configuration?
@Typine3 Can you please check if the latest version 11.0.4 is solving your problem?