crawler icon indicating copy to clipboard operation
crawler copied to clipboard

Crawler builds queue on 10.4 project, but doesn't process queue

Open Typine3 opened this issue 4 years ago • 31 comments

Hello, I'm trying to get crawler running on a 10.4 project sinces days - with no success. I'm always facing the same problem: I can produce (by "build" task in scheduler) records in tx_crawler_queue. When I want to process them (via "run" scheduler task), something happens, for example result_data and process_id is filled - but process_id_completed stays empty and nothing is written in the index_x tables, for example index_rel. What I did:

  • indexed_search and crawler are installed.
  • In indexed_search "Debug mode" ist enabled and "Disable Indexing in Frontend" is set to true. "Use "crawler" extension to index..." can be true or not, the result is the same.
  • In crawler I did not change the default configuration except filling the "Name of php binary" field (which seems to be correct - I don't get an error message in Info-SiteCrawler-CrawlingProzesse) and aktivating "debug.processDebug".
  • I created a Crawler Configuration Record in root page which is used for the build task. There ist nothing special - I use "Re-indexing tx_indexedsearch_reindex" and in "Pids only" I wrote only the startpage for test. If would be very happy if someone has an idea - I'm running out of ideas now feeling I tested everything that could be important. Also I can exclude a special configuration in this project - as I tried everything on two different 10.4 projects hosted by different providers - the result is the same. Thanks, CB

Typine3 avatar Jun 28 '21 07:06 Typine3

Hi @Typine3

Sounds strange to me.

Which version of the Crawler are you using? Are you installing you TYPO3 with composer? Which PHP version? Which MySQL/MariaDB Version?

tomasnorre avatar Jun 28 '21 09:06 tomasnorre

crawler 9.2.5 No Composer php7.3LATEST MySQL 5.6.19 Thanks for answering so fast!

Typine3 avatar Jun 28 '21 09:06 Typine3

Thanks for the additional info. Will see if I can get to reproduce this next week.

tomasnorre avatar Jun 28 '21 11:06 tomasnorre

As the Crawler works without indexed_search I tend to say it's a misconfiguration of the indexed_search. Could you perhaps check if there are any pointers here to help you out? https://t3terminal.com/blog/typo3-indexed-search/

A good way to prove if indexed_search is working as intended is to enable frontend indexing. If that works, we are a little closer to locating the problem.

tomasnorre avatar Jun 28 '21 12:06 tomasnorre

Well, in both projects indexed_search worked before I disabled frontend indexing! I had to TRUNCATE the index_x tables for testing, which were full before...I'll have another look at the t3terminal-source you mentioned which I already know...Thanks for looking at my problem! If I find something out meanwhile, I'll tell you. It would be so cool if we could solve this problem and I could use crawler for an indexation at night.

Typine3 avatar Jun 28 '21 13:06 Typine3

So, in one of the projects I just enabled frontend indexing. index_rel, which was empty before, contained sentences as soon as I had done some actions in Frontend!

Typine3 avatar Jun 28 '21 14:06 Typine3

The frontend index will fill the index as soon as someone visits the page. But as you can never be sure every page is visited, it's and advantage to do it with the crawler.

I'll be happy to help you futher, I see the problem in my test setup too, as you described, but I've limited knowledge on indexed-search so don't know where to start.

If the frontend index is on and the crawler runs, it should fill the index too, as it's basically only a frontend request.

If you don't find a solution, I will happily try to help out more, and will look more into this myself too.

tomasnorre avatar Jun 28 '21 18:06 tomasnorre

Hi again @Typine3,

I got a little further. Don't think it's an indexed_search configuration problem, but a Crawler bootstrapping/class-loading problem. When I use my crawler devbox which is installed with composer, then it works, but the one without composer doesn't work.

This sounds like a bootstrapping / Class-loading problem, but where I don't know yet. Will need to get into debug mode to figure out more about that.

tomasnorre avatar Jun 28 '21 22:06 tomasnorre

Thank you for working on this issue! That means that really in every non composer installation crawler will not work for the moment? I would be very happy if this could be solved - in the project where I want to apply it I cannot use frontend indexing and therefore have to do indexing by crawler at night...

Typine3 avatar Jun 29 '21 06:06 Typine3

It doesn't work in combination with indexed_search, correct. The crawler itself for e.g. Cache warm up is working.

tomasnorre avatar Jun 29 '21 12:06 tomasnorre

Yes...Is there a chance to get this solved?

Typine3 avatar Jun 29 '21 12:06 Typine3

Nothing is impossible. But don't know the solution yet. @infabo has suggested adding the indexed_search to the ext_emconf.php suggest array, to ensure load order.

Have tried it really quick, but didn't work at first. Need to get the debugger started and checking it out. Having holiday right now, so not that much in front of my laptop :)

tomasnorre avatar Jun 29 '21 15:06 tomasnorre

Just a shot from the hip: after adding it tonthe suggest array, you need to deactivate and activate the extension in extensionmanager to get orders applied.

infabo avatar Jun 29 '21 21:06 infabo

It's really nice that you guys care about this problem even in your holiday! So there's still some hope for me :)

Typine3 avatar Jun 30 '21 05:06 Typine3

I just tried it - changed crawler's ext_emconf.php (see attached file), deactivated and activated indexed_search and crawler, flushed cache - and the problem is still the same... Eintrag_emconf_suggests

Typine3 avatar Jul 01 '21 15:07 Typine3

Thanks for letting me know.

tomasnorre avatar Jul 01 '21 15:07 tomasnorre

I shyly wanted to ask if there are any new findings on this topic...

Typine3 avatar Jul 08 '21 05:07 Typine3

Sorry, not yet.

tomasnorre avatar Jul 08 '21 09:07 tomasnorre

OK, thanks for answering...

Typine3 avatar Jul 08 '21 15:07 Typine3

Tomorrow afternoon is my Crawler day, so will look into again tomorrow, hopefully I'll come closer to an answer.

tomasnorre avatar Jul 08 '21 18:07 tomasnorre

Oh, then have a nice crawler afternoon! :) And tell me if I can help something, e.g. try something on my two different 10.4 installations...

Typine3 avatar Jul 09 '21 05:07 Typine3

I just release a new version of the Crawler (9.2.6). I would be surprised if it fixes this problem, but it fixes another problem with non-composer projects.

tomasnorre avatar Jul 14 '21 16:07 tomasnorre

OK, I'll try tomorrow! Thanks and nice evening!

Typine3 avatar Jul 14 '21 16:07 Typine3

I did test i now on one project with crawler 9.2.6 - and nothing changed. As you supposed this didn't fix the problem...

Typine3 avatar Jul 16 '21 09:07 Typine3

Sorry to hear, would also have been a funny coincidence. Thanks for taking your time to test anyway.

I'll look more into this as soon as I find some time.

tomasnorre avatar Jul 20 '21 06:07 tomasnorre

Thanks...I'm looking forward to your findings :)

Typine3 avatar Jul 30 '21 09:07 Typine3

Still no proceedings? I'll have to find another solution then.

Typine3 avatar Sep 15 '21 14:09 Typine3

Unfortunately not. I haven't found any solution for the Problem.

I'll update here as soon as I find a solution. But as this is not my full time work: I cannot promise anything.

tomasnorre avatar Sep 15 '21 16:09 tomasnorre

As both Crawler 9.x.y and 10.x.y are compatible with TYPO3 10, would you mind trying to update the crawler to see if the issue persists?

tomasnorre avatar Sep 28 '21 20:09 tomasnorre

Can this be related to #851 where the issue is the "baseURL" not been taking from the Sites configuration?

tomasnorre avatar Nov 19 '21 10:11 tomasnorre

@Typine3 Can you please check if the latest version 11.0.4 is solving your problem?

tomasnorre avatar Feb 11 '22 08:02 tomasnorre