Pascal Essiembre comments

Results 74 comments of


                                            Pascal Essiembre

SiteMinder Authentication

Hello Akshay, No update. Do you have a SiteMinder site with temporary access so we can give it a try?

SiteMinder Authentication

@Krishna210414 , the issue is the same: we need a sample SiteMinder site we can use as a test. You got one you can share?

Indexing value of a new metatag

I suspect it is related to having it defined as part of the pre-parse handlers. Fields extracted from a file content are created when the document is parsed. So before...

Fetching document text from alternate text/plain rel link

Interesting. What about other types (e.g. ``application/rss+xml``)? And what if more than one such link is provided (with the same type or different ones)? Which one should prevail? Because use...

Question: crawling in similar domain

The only thing I can think of is you added/modified the filtering rules after you ran the Collector a few times and it got that URL from the "crawlstore" cache....

Question: crawling in similar domain

Given it works for you now. Can we close? Feel free to submit a pull request if you feel your filters are ready for general use.

Question: crawling in similar domain

**1. cannot-extract-links** Work as expected. Your start URL is on ``programme.rthk.hk`` and you want to stay on domain + subdomains. A link to ``www.rthk.hk`` domain gets rejected because it is...

Question: crawling in similar domain

The ``referenceFilters`` will filter out unwanted URLs before they are downloaded. Make sure you make it restrictive enough there as well. As far as having ``subA.main.com`` also accepting ``subB.main.com`` when...

collector v.3 - log4j2 - log-file per crawler

As you found out, in v3, the code base no longer controls log writing so people can implement logging however they want with their favourite logger implementation. For Log4J2, your...

collector v.3 - log4j2 - log-file per crawler

Just to add, since the crawler name appears in the thread name, you can use the following variable in your routing: ``` ${event:ThreadName} ``` See https://logging.apache.org/log4j/log4j-2.15.1/manual/lookups.html#EventLookup