incubator-stormcrawler icon indicating copy to clipboard operation
incubator-stormcrawler copied to clipboard

Delete redirected pages

Open jnioche opened this issue 1 year ago • 1 comments

From a user

Links that were once pages and then turn to redirects are our issue. Our content management system auto creates clean URLs. If the title of the page is changed the clean URL is changed and the old URL is redirected to the new URL. The old URL stays in our index unless manually removed. When a link is changed from FETCHED to REDIRECT it would be ideal if the index is removed.

jnioche avatar Aug 08 '22 07:08 jnioche

With further though of the process it seems to make sense to replicate the FETCH_ERROR process. It tries x times to get the URL and if it remains a redirect it becomes an ERROR which would trigger the deletion bolt. With the x times variable you would be able to control whether the feature is used by setting it to 0 or -1 to always keep it active in the status like the current state.

circuscowboy avatar Aug 08 '22 14:08 circuscowboy