html-validator icon indicating copy to clipboard operation
html-validator copied to clipboard

`html-validator` is outputting more and more errors

Open jfoclpf opened this issue 5 years ago • 16 comments

html-validator is outputting more and more errors, I suppose it depends on the overloading of the main server at W3.org

    at Request._callback (/home/jfolpf/autocosts/node_modules/html-validator/lib/validate.js:15:23)
    at Request.self.callback (/home/jfolpf/autocosts/node_modules/html-validator/node_modules/request/request.js:185:22)
    at Request.emit (events.js:223:5)
    at Request.<anonymous> (/home/jfolpf/autocosts/node_modules/html-validator/node_modules/request/request.js:1161:10)
    at Request.emit (events.js:223:5)
    at IncomingMessage.<anonymous> (/home/jfolpf/autocosts/node_modules/html-validator/node_modules/request/request.js:1083:12)
    at Object.onceWrapper (events.js:312:28)
    at IncomingMessage.emit (events.js:228:7)
    at endReadableNT (_stream_readable.js:1185:12)
    at processTicksAndRejections (internal/process/task_queues.js:81:21)

There is any tip you can give to avoid overloading the sever? I do around 50 request in series with intervals of 1,1 seconds between requests, but still sometimes it blows up

jfoclpf avatar Feb 11 '20 22:02 jfoclpf

Are you able to reproduce this error in a way I can test it?

I thought overloading the w3c-validator would result in a status code that would be handled.

You could run the validator as a webservice yourself here are instructions for running it locally as native code or through docker.

zrrrzzt avatar Feb 12 '20 07:02 zrrrzzt

It is very hard for me to present here now a test script that emulates the error, cause the errors appear as part of a large test script for html files which are loaded by a local http server, since I'm using handlebars to dynamically generate the html files. But the script which is failing is this one: https://github.com/jfoclpf/autocosts/blob/master/test/validateHtml.js#L115

jfoclpf avatar Feb 12 '20 09:02 jfoclpf

Thank you. I'll see if I'm able to test it later.

However this looks like a module that might work for you: https://www.npmjs.com/package/html-validate. From the description https://html-validate.org/dev/using-api.html You could probably rewrite your testfile with minimum effort. Disclaimer: I've not tested it myself but from the description it will validate your html locally.

Some day I might switch to this one instead of w3c for html-validator :-)

zrrrzzt avatar Feb 12 '20 13:02 zrrrzzt

Good tip, thanks a lot. Nonetheless I am not sure whether that validator applies all the rules of W3C, which is the main standard in the web. After reading its description, that is not clear to me, and there are a lot of html validation W3C subtleties that might slip from such validator.

On Wed, 12 Feb 2020, 14:14 Geir Gåsodden, [email protected] wrote:

Thank you. I'll see if I'm able to test it later.

However this looks like a module that might work for you: https://www.npmjs.com/package/html-validate. From the description https://html-validate.org/dev/using-api.html You could probably rewrite your testfile with minimum effort. Disclaimer: I've not tested it myself but from the description it will validate your html locally.

Some day I might switch to this one instead of w3c for html-validator :-)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zrrrzzt/html-validator/issues/183?email_source=notifications&email_token=AA6M4DIK4NJT3DA6TAWMWNTRCPYZZA5CNFSM4KTLK4C2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELQW36Y#issuecomment-585199099, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6M4DNCYLOY5TMOO5GG7XTRCPYZZANCNFSM4KTLK4CQ .

jfoclpf avatar Feb 12 '20 15:02 jfoclpf

I think the validation follows all the rules. Anyway, I've done some testing and will probably release a new version around this weekend https://github.com/zrrrzzt/html-validator/pull/184

It will be a breaking change, but for your use I think it will be just a small rewrite.

zrrrzzt avatar Feb 12 '20 17:02 zrrrzzt

Great, thanks for the update. Indeed my test suite took ages cause I validated more than 40 html files in series with more than 1 second interval between them

On Wed, 12 Feb 2020, 18:35 Geir Gåsodden, [email protected] wrote:

I think the validation follows all the rules. Anyway, I've done some testing and will probably release a new version around this weekend #184 https://github.com/zrrrzzt/html-validator/pull/184

It will be a breaking change, but for your use I think it will be just a small rewrite.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zrrrzzt/html-validator/issues/183?email_source=notifications&email_token=AA6M4DLPFPRBTY7HWX3QZDDRCQXMZA5CNFSM4KTLK4C2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELRVIBQ#issuecomment-585323526, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6M4DISAVLJNOC2MS2GW3DRCQXMZANCNFSM4KTLK4CQ .

jfoclpf avatar Feb 12 '20 18:02 jfoclpf

There was an issue with the server, this should be fixed now, thanks to @sideshowbarker

I strongly advice against using anything else other than the official validator. It's been proven over all these years that this lead to poor results.

BTW I don't like the fact that the official validator needs Java, but that's how it is.

Just my 2 cents.

XhmikosR avatar Feb 13 '20 14:02 XhmikosR

@XhmikosR what I realised is that html-validate is much more stricter that the one from W3C. I'd say if passes the prior, it passes the latter.

jfoclpf avatar Feb 13 '20 15:02 jfoclpf

You, and everyone else, are free to use whatever you fits your needs. I know from experience (without having tried html-validate because I don't have any need to), is that every other package out there that claimed to validate HTML, failed badly. Maybe html-validate is the exception to this, I might try it if I ever have the need for this.

I don't really mind since I will just use another package if this one will switch to an unofficial validator. Unless that unofficial validator proves to be on par, with following the specs 100%.

Like I said, that's my personal opinion after years of working on this stuff, and I'm definitely not blaming html-validate which like I said I haven't used.

XhmikosR avatar Feb 13 '20 15:02 XhmikosR

@XhmikosR I'd love to have an official validator that works offline, because the server from W3C is very unreliable some times, and thus my tests suites on travis may output an error just because of that, failing the gits to be committed. And I'm validating 40 files in series with intervals of 1,1 second in-between just to avoid overloading the server, and still sometimes it fails.

jfoclpf avatar Feb 13 '20 16:02 jfoclpf

You can just use vnu-jar. Like I said it does require the Java runtime to be present, though.

Otherwise you are welcome to extend the npm package package to download the prebuilt binary based on the platform the user is on and submit a PR upstream :)

XhmikosR avatar Feb 13 '20 16:02 XhmikosR

According to the documentation of html-validate:

When rendering a document it is useful to try to correct malformed markup but a validator should be strict. No corrections, assumptions or guessing is done. If the markup is invalid the parser will tell you so. By ensuring the markup is strictly valid it reduces the amount of bugs where different browsers autocorrect the markup differently (this is especially true for mobile browsers).

As I said, my files were 100% ok on W3C and suddenly now have more than 300 errors with html-validate. If I have a critic to make to html-validate is that is too strict and picky, much more than necessary.

jfoclpf avatar Feb 13 '20 16:02 jfoclpf

Hi again :) I was really fed up and about to uninstall this dependency, not your fault but I can imagine that the W3C server is bombarded with requests and they need to drop some. This module was constantly giving errors when the code was perfectly OK.

I eventually solved the issue with async.retry

const async = require('async')
const validator = require('html-validator')

// try calling apiMethod 10 times with exponential backoff
// (i.e. intervals of 100, 200, 400, 800, 1600, ... milliseconds)
async.retry({
  times: 10,
  interval: function(retryCount) {
    return 50 * Math.pow(2, retryCount);
  }
}, (callback) => {
    const options = {
      format: 'text',
      data: body // html body to be validated
    }

    validator(options)
      .then((result) => {
        if (result.toLowerCase().includes('error')) {
          callback(Error('Found html error'))
        } else if (result.toLowerCase().includes('warning')) {
          callback(Error('Found html warning'))
        } else {
          callback()
        }
      })
      .catch((err) => {
        callback(Error(err))
      })
}, function(err, result) {
    // do something with the result
})

I strongly recommend you to implement it

jfoclpf avatar Jul 14 '21 20:07 jfoclpf

the async.retry could be transparent to the user and you could allow it in the options

(async () => {
  const validator = require('html-validator')
  const options = {
    url: 'http://url-to-validate.com',
    format: 'text',
    retry: 10,
    retryInterval: (retryCount) => 50 * Math.pow(2, retryCount)
  }
  
  try {
    const result = await validator(options)
    console.log(result)
  } catch (error) {
    console.error(error)
  }
})()

what do you think @zrrrzzt ? Or maybe this simply goes totally beyond the scope of this module?

jfoclpf avatar Jul 14 '21 20:07 jfoclpf

If you want to do a PR for retry you are welcome :-)

I would prefer the default to be no retries and rather let the user opt in for retry in the options.

I don't know what environment you are running your tests in but you could run your own instance of the validator server with docker docker run -it --rm -p 8888:8888 ghcr.io/validator/validator:latest (or deploy the image on your own infrastructure) and use the validator option in this module.

zrrrzzt avatar Jul 15 '21 06:07 zrrrzzt

Thanks, I will do a PR when I find a slice or time.

Yes, by default no retry, I fully agree with you, but that option should be available :) it changes everything when you're using the W3C official server.

My test suite runs in several Continuous Integrations platforms and I don't want to worry about implementing another extra service which needs maintenance and updates.

I realised async.retry is every effective, normally W3C server never fails and when it drops the connection the retry only needs to retry maximum 1 time. But it changes everything, I stopped having constant false alarms which are very frustrating cause you think there's something wrong with your code.

jfoclpf avatar Jul 15 '21 08:07 jfoclpf