purple-hats icon indicating copy to clipboard operation
purple-hats copied to clipboard

PDFs are being scanned when they shouldn't be.

Open mgifford opened this issue 10 months ago • 1 comments

I am not setting the filetype

  -i, --fileTypes                   

With

node --max-old-space-size=6000 --no-deprecation purple-a11y/cli.js -u https://www.whitehouse.gov -c 2 -s same-domain -p 50  -a none --blacklistedPatternsFilename ./pa-gTracker-exclude-medicare.csv -k "Random Example:[email protected]"

But I am still finding PDFs in the list of URLs crawled. This shouldn't be the case..  If the default is html only then I shouldn't see any PDFs (or other docs) in my results.

mgifford avatar Apr 15 '24 16:04 mgifford

Hi @mgifford, can I check which version of Purple A11y are you using to run the scan? E.g. 0.9.46, or newer (i.e. directly from GitHub master)?

If you are running a version from master, can you get the commit id so I can understand if this issue was already fixed? You can use the following command: git log -1 --format="%H"

I have not been able to replicate the issue of pdfs scanned when default strategy is html-only on latest master commit

younglim avatar Apr 19 '24 02:04 younglim