Pascal Essiembre
Pascal Essiembre
Hello Akshay, No update. Do you have a SiteMinder site with temporary access so we can give it a try?
@Krishna210414 , the issue is the same: we need a sample SiteMinder site we can use as a test. You got one you can share?
I suspect it is related to having it defined as part of the pre-parse handlers. Fields extracted from a file content are created when the document is parsed. So before...
Interesting. What about other types (e.g. ``application/rss+xml``)? And what if more than one such link is provided (with the same type or different ones)? Which one should prevail? Because use...
The only thing I can think of is you added/modified the filtering rules after you ran the Collector a few times and it got that URL from the "crawlstore" cache....
Given it works for you now. Can we close? Feel free to submit a pull request if you feel your filters are ready for general use.
**1. cannot-extract-links** Work as expected. Your start URL is on ``programme.rthk.hk`` and you want to stay on domain + subdomains. A link to ``www.rthk.hk`` domain gets rejected because it is...
The ``referenceFilters`` will filter out unwanted URLs before they are downloaded. Make sure you make it restrictive enough there as well. As far as having ``subA.main.com`` also accepting ``subB.main.com`` when...
As you found out, in v3, the code base no longer controls log writing so people can implement logging however they want with their favourite logger implementation. For Log4J2, your...
Just to add, since the crawler name appears in the thread name, you can use the following variable in your routing: ``` ${event:ThreadName} ``` See https://logging.apache.org/log4j/log4j-2.15.1/manual/lookups.html#EventLookup