ext-solr icon indicating copy to clipboard operation
ext-solr copied to clipboard

[BUG] Not deleting item from solr if record is deleted

Open PUMfriesens opened this issue 5 years ago • 8 comments

Describe the bug When a record gets deleted from the database, the garbage collector tries to removeGarbageOf that item. It all goes down to AbstractStrategy::deleteIndexDocuments where the strategy tries to get the $indexQueueItems. Trying to get those items you get to QueueItemRepository::getQueueItemObjectsByRecords where it realizes that item is not available anymore and just removes the item from the queue. The Solr Item itself stays intact and does not get deleted.

To Reproduce Steps to reproduce the behavior:

  1. Delete Record from Database
  2. Execute RecordMonitor::applyRecordChangesToQueue or run GarbageCollector::collectGarbage directly

Expected behavior QueueItem gets removed from Queue and IndexItem gets removed from Solr

Used versions (please complete the following information):

  • TYPO3 Version: 9.5.18
  • Browser: Chrome
  • EXT:solr Version: 10.0.1
  • Used Apache Solr Version: 8.4.1
  • PHP Version: 7.2.26
  • MySQL Version: 5.7.29

PUMfriesens avatar Jun 10 '20 04:06 PUMfriesens

If I understood correctly, you are deleting records from the database directly and not through the TYPO3 backend, correct?

jacobsenj avatar Jul 01 '20 14:07 jacobsenj

Correct, deleting them right from a repository without deleted flag.

PUMfriesens avatar Jul 17 '20 19:07 PUMfriesens

Thanks for reporting. I'll close this in favor of #126 and #2035 @PUMfriesens please follow, test and comment in #2035.

dkd-kaehm avatar Jul 20 '20 06:07 dkd-kaehm

The Problem is not within the RecordMonitor.

The GarbageCollector is cleaning up all remainings of deleted items in the typo3 tables. But it does not delete the item from solr itself. What is missing is the following:

$solr->getWriteService()->deleteByQuery('type:' . $table . ' AND uid:' . (int)$uid);
$solr->getWriteService()->commit(false, false);

PUMfriesens avatar Jul 20 '20 07:07 PUMfriesens

$solr->getWriteService()->deleteByQuery('type:' . $table . ' AND uid:' . (int)$uid);
$solr->getWriteService()->commit(false, false);

OK, this will resolve this issue partially. Can you provide a PullRequest?

dkd-kaehm avatar Jul 20 '20 10:07 dkd-kaehm

I'm on updating #2035 for solr 11.1.x and TYPO3 10 and had a look on this. I can confirm the issue code wise but IMO we could not simple delete the document by type and uid. We need the siteHash here, too.

The simple idea is to use the record miss detection in QueueItemRepository https://github.com/TYPO3-Solr/ext-solr/blob/main/Classes/Domain/Index/Queue/QueueItemRepository.php#L755

The other and somehow cleaner (to me) solution is to not rely on the related records, when fetching the QueueItems in AbstractStrategy::deleteIndexDocuments. There could be more cases when IndexQueueItems and related documents should be deleted when the record is already gone. And it would save some runtime. This could be archived by an early return here https://github.com/TYPO3-Solr/ext-solr/blob/main/Classes/Domain/Index/Queue/QueueItemRepository.php#L586 by returning the $indexQueueItemRecords directly and not calling getIndexQueueItemObjectsFromRecords. Means we need new a method parameter in Queue::getItems, QueueItemRepository::findItemsByItemTypeAndItemUid and QueueItemRepository::getItemsByCompositeExpression

What do you think @dkd-kaehm?

Mabahe avatar Jan 19 '22 10:01 Mabahe

I'm on updating #2035 for solr 11.1.x and TYPO3 10 and had a look on this. I can confirm the issue code wise but IMO we could not simple delete the document by type and uid. We need the siteHash here, too.

The simple idea is to use the record miss detection in QueueItemRepository https://github.com/TYPO3-Solr/ext-solr/blob/main/Classes/Domain/Index/Queue/QueueItemRepository.php#L755

The other and somehow cleaner (to me) solution is to not rely on the related records, when fetching the QueueItems in AbstractStrategy::deleteIndexDocuments. There could be more cases when IndexQueueItems and related documents should be deleted when the record is already gone. And it would save some runtime. This could be archived by an early return here https://github.com/TYPO3-Solr/ext-solr/blob/main/Classes/Domain/Index/Queue/QueueItemRepository.php#L586 by returning the $indexQueueItemRecords directly and not calling getIndexQueueItemObjectsFromRecords. Means we need new a method parameter in Queue::getItems, QueueItemRepository::findItemsByItemTypeAndItemUid and QueueItemRepository::getItemsByCompositeExpression

What do you think @dkd-kaehm?

@Mabahe Thanks for your advises. All my power is currently focused on TYPO3 11 LTS compatibility. But @dkd-friedrich works currently on special TYPO3 10 LTS release, see #3154, #3157, and https://github.com/TYPO3-Solr/ext-solr/tree/release-11.2.x. The #2035 can not be a part of release-11.1.x but may be a part of release-11.2.x or release-11.5.x(!!! TYPO3 11 LTS) As you can see the #3154 and the issue you are working on and the proposal(quoted above) are highly related to each other.

@dkd-friedrich and @Mabahe, could you please cooperate on #3154 and #2641 and #2035?

dkd-kaehm avatar Jan 19 '22 11:01 dkd-kaehm

I see, thx for the update, since #2641 is only relevant in combination with #2035 ATM and I don't think we get #2035 ready in time for a 11.2 or 11.5 release, we should postpone this.

But #3154 will also be ported to 11.5, right?

I think it depends on the user needs or your release plan, if #2035 should also be part of a T3v10 compatible version - perhaps 11.3.x then or a T3v11 version only. I can still work with a patch for 11.1.x or 11.2.x in my project. For the moment I have a 11.1 based WIP patch for #2035 only.

Mabahe avatar Jan 19 '22 11:01 Mabahe