ethereumjs-monorepo icon indicating copy to clipboard operation
ethereumjs-monorepo copied to clipboard

VM: Test Runner Upgrade to Vitest

Open ScottyPoi opened this issue 1 year ago • 26 comments

Update VM-Test-Runner to use Vitest.

Blockchain

Standard Tests: npm run test:blockchain:transitionForks npm run test:blockchain:allForks

To run selected fork(s): npm run test:blockchain --fork=chainstart npm run test:blockchain --fork=chainstart,homestead

To run a SINGLE test and check the numbers npm run test:blockchain:single --fork=chainstart

State

Standard Tests:

Standard Tests: npm run test:state:selectedForks npm run test:state:allForks

To run selected fork(s): npm run test:state --fork=chainstart npm run test:state --fork=chainstart,homestead

To run a SINGLE test and check the numbers npm run test:state:single --fork=chainstart

CLI Args (WIP)

CLI --flag arguments should work as before WIP

Note

Currently -- even running in "Single" mode will not show the full number of Expected-Tests It will show the number of tests that were run, but will not include the total sum, which includes iterations of those tests.

For example -- npm run test:state:single --fork=chainstart

The test expects > 4496 tests to run. But the console will say 1440.

  • 1440 is the number of unique tests that will be run.
    • Many of these tests involve running multiple "test-cases"
  • 4496 is the TOTAL number of TEST-CASES that should run.

The final check is expect(testCount).toBeGreaterThanOrEqual(4496) Which you should be able to inspect after running a SINGLE test

ScottyPoi avatar Jun 22 '23 23:06 ScottyPoi

Codecov Report

Merging #2816 (bcb2706) into master (0ea9df3) will decrease coverage by 0.50%. The diff coverage is n/a.

:exclamation: Current head bcb2706 differs from pull request most recent head cc0daf2. Consider uploading reports for the commit cc0daf2 to get more accurate results

Additional details and impacted files

Impacted file tree graph

Flag Coverage Δ
vm 77.37% <ø> (-0.50%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

codecov[bot] avatar Jun 22 '23 23:06 codecov[bot]

There had been a lot of discussion around this task already, and both Jochem and myself gave this a first-round try. Please coordinate with Jochem and/or me first before you continue with this task.

holgerd77 avatar Jun 23 '23 07:06 holgerd77

Update: just reading up the thread in the chat, so taking this half-back, didn't realize that you got that far with that, that's great! 🙂 Was a bit irritating though nevertheless, since actually three - haha - so Andrew as well already looked into the task. 😂

holgerd77 avatar Jun 23 '23 08:06 holgerd77

Update: just reading up the thread in the chat, so taking this half-back, didn't realize that you got that far with that, that's great! slightly_smiling_face Was a bit irritating though nevertheless, since actually three - haha - so Andrew as well already looked into the task. joy

so...should i finish this or not? i don't understand why solving something would irritate you

ScottyPoi avatar Jun 23 '23 20:06 ScottyPoi

Update:

The CLI arguments seem to not get through. The current workaround in state.spec.ts and blockchain.spec.ts runs the allForks version of each test with the options from the npm scripts.

Have to step away for the night, but I will work out a clean solution to it all when I get back.

ScottyPoi avatar Jun 24 '23 00:06 ScottyPoi

so...should i finish this or not? i don't understand why solving something would irritate you

I think you have successfully taken over the task! 🙂

(I was not the only one irritate though, so this was more about the communication around, respectively the not-so-much-communcation 😋)

Anyhow: yeah, so maybe do not jump on it night-and-day, but it would be great if you can continue this a bit as a side task?

I just looked into the numbers: I see test run times around 2-3 minutes (old) vs 10-20 seconds (new). If this holds this would be truly remarkable! 🤩

Numbers of overall tests do not yet get together though completely, so e.g. same blockchain tests/same fork atm have different numbers of overall run tests, guess that's something which can be fixed. Generally this would be a dramatic step performance wise though (or is some part of a single test not yet run?).

holgerd77 avatar Jun 26 '23 12:06 holgerd77

Hi Scotty, if you are interested and it fits can you eventually:

a) rebase or update this PR so that #2808 is included b) grasp the MCOPY blockchain tests from here: https://github.com/ethereum/tests/pull/1234/files c) hack this into the blockchain vitest run to run the tests and then d) Use the vitest --coverage flag together with the vitest UI from https://vitest.dev/guide/coverage.html#vitest-ui and e) see how coverage is for this tests?

And then you could give an answer to Pawel here: https://github.com/ethereum/tests/pull/1234#issuecomment-1606144734, guess that would be pretty helpful! 🙂

holgerd77 avatar Jun 26 '23 12:06 holgerd77

(small side note: the coverage should cover all of ./EIPTests/BlockchainTests/stEIP5656-MCOPY` tests in that dir)

jochem-brouwer avatar Jun 26 '23 13:06 jochem-brouwer

Hi Scotty, if you are interested and it fits can you eventually:

Yes I will look into coverage tomorrow, and include that additional test.

ScottyPoi avatar Jun 27 '23 09:06 ScottyPoi

Numbers of overall tests do not yet get together though completely, so e.g. same blockchain tests/same fork atm have different numbers of overall run tests, guess that's something which can be fixed. Generally this would be a dramatic step performance wise though (or is some part of a single test not yet run?).

If you do run a test without the outer describe, you'll see the proper number of tests run. However, running this way does reduce the performance improvements, and will cause some of the longer tests to crash. The same tests run quickly when part of a suite, although it makes inspecting individual results difficult.

We do include a check for the expected number of test cases at the end of each test, though.

ScottyPoi avatar Jun 27 '23 09:06 ScottyPoi

Hi Scotty, if you are interested and it fits can you eventually:

Yes I will look into coverage tomorrow, and include that additional test.

Ok, cool! This is just meant to be a temporary test though, so not for permanent inclusion of this test file(s), just to avoid a potential misunderstanding here.

holgerd77 avatar Jun 27 '23 10:06 holgerd77

npx vitest test/tester/blockchain/stEIP5656-MCOPY

will run the MCOPY tests in the blockchain runner

ScottyPoi avatar Jun 28 '23 06:06 ScottyPoi

This is the coverage report for the MCOPY tests: image

Does he really mean that this should be 100% if the tests all run??

ScottyPoi avatar Jun 28 '23 06:06 ScottyPoi

The remaining failures in the CI are heap memory errors in the larger blockchain tests.

i imagine there's a way around that?

ScottyPoi avatar Jun 30 '23 00:06 ScottyPoi

Hi there, what’s the status of this? :-)

(I the hell can’t find the emojis on my iPad keyboard #insert laughing emoji here#)

holgerd77 avatar Jul 04 '23 13:07 holgerd77

Hi there, what’s the status of this? :-)

(I the hell can’t find the emojis on my iPad keyboard #insert laughing emoji here#)

@holgerd77

I feel confident that a passing test is a passing test at this point :smile:

I think there is still a lot of improvement that can be done that utilizes the built in tools in vitest better that this currently does.

The green XXXX passed number at the end will not show the full total of all checks in all test runs of all tests in all subdirectories....etc. But I have included a console log at the end of each test that does display those results.

If that feels good to you, we can call this done. Today is a holiday in the US, but I'll put some time into cleaning up the commits and writeup.

I can open another PR for any further development on it down the road.

ScottyPoi avatar Jul 04 '23 20:07 ScottyPoi

On another review, I am second guess myself regarding this "speedy" test run actually running all of the code :unamused: However a "slow" run is very very slow. (up to 5 minutes for some).

I'm going to try again to use the Vitest Runner API, because

ScottyPoi avatar Jul 08 '23 06:07 ScottyPoi

I just checked out this branch and ran it with npm run test:blockchain -- --fork=Paris. The test suite returned in 10 seconds. This is not possible, on my local machine there are several tests which run almost 30 seconds. So, for some reason the tests are loaded (I think you are correct that they load the right files), but, for some reason it does not actually run these tests.

If the "slow tests" run, then there is something in the parallelism which makes vitest think that it is done, while it is not :thinking:

jochem-brouwer avatar Jul 08 '23 16:07 jochem-brouwer

It almost feels like it runs through the whole thing twice. Once to count the number of tests and set up the reporting. And then again to run the tests? Including all of the file loading and blockchain building etc.

Almost like it has to run all of the code to know how many tests it will run before doing it all again... if that makes sense?

ScottyPoi avatar Jul 09 '23 02:07 ScottyPoi

I just tested by changing a blockchain test lastblockhash (for instance BlockchainTests/GeneralStateTests/stBadOpcode/badOpcodes.json and then badOpcodes_d0g0v0_Merge) to something different. This should fail, but it does not if I run npm run test:blockchain -- --fork=Paris. So it seems like, at least, part of the tests are not running.

jochem-brouwer avatar Jul 09 '23 09:07 jochem-brouwer

If I comment out test('${id}', async () => {, suite('testData', async () => {, it(should have the correct _headHeaderHash, async () => { and their corresponding closing brackets, and then re-run tests, then I see the tests spawning in the runner. I think the first it spawns a test, but this insta-passes, since it internally just spawns new methods and then returns. I think it should await the suite (have to look into vitest docs what this suite is) and the second it which is commented out.

If I just comment them out, it spawns too many threads an the "worker threads goes out of memory".

(In BlockchainTestsRunner)

jochem-brouwer avatar Jul 09 '23 22:07 jochem-brouwer

image

If I edit a bit I can get the vitest to output this, but this runs out of memory.

I noticed that both it and suite are used, they seem to be the same? https://stackoverflow.com/questions/45778192/what-is-the-difference-between-it-and-test-in-jest

Also, I think we need to study how to multithread vitest and how this works. There are a lot of suite (this seems to be the same as describe?) nested in suites and I don't know if these spawn new threads or not :thinking:

jochem-brouwer avatar Jul 09 '23 22:07 jochem-brouwer

This is what slow but honest test results look like:

image

~ 4 minutes

ScottyPoi avatar Jul 10 '23 06:07 ScottyPoi

Yes, so 4 minutes is definitely ok. That‘s not „slow“ for this amount of tests.

holgerd77 avatar Jul 10 '23 08:07 holgerd77

Also, I think we need to study how to multithread vitest and how this works. There are a lot of suite (this seems to be the same as describe?) nested in suites and I don't know if these spawn new threads or not thinking

@jochem-brouwer

  1. I think that regardless of multithread settings, the initial setup is always done on a single thread.
  2. Describe === Suite
  3. Test === It
  4. Suites can be nested, and the only affect is the way the results will be organized
  5. Tests (it) can NOT be nested.
  6. Suites can not be nested inside of tests
  7. I think that multithreading occurs per test, not per suite

so in this example: vitest would show 4 test results (1 for each test). each test will pass if all of the expect or assert checks inside of it pass.

suite('a', async () => {
  suite('1', async () => {
    test('1_A)', async () => {
      expect(thing)
      expect(thing)
      expect(thing)
    })
    test('1_B', async () => {
      expect(thing)
      expect(thing)
      expect(thing)
    })
  })
  suite('2', async () => {
    test('2_A)', async () => {
      expect(thing)
      expect(thing)
      expect(thing)
    })
    test('2_B', async () => {
      expect(thing)
      expect(thing)
      expect(thing)
    })
  })
})

and in this example, vitest would show 2 test results, all of which would pass, even though they are hardcoded with failures, because you cannot nest tests

suite('a', async () => {
  test('1', async () => {
    test('1_A)', async () => {
      assert.fail()
    })
    test('1_B', async () => {
      assert.fail()
    })
  })
  test('2', async () => {
    test('2_A)', async () => {
      assert.fail() 
    })
    test('2_B', async () => {
       assert.fail()
    })
  })
})

ScottyPoi avatar Jul 10 '23 11:07 ScottyPoi

4 minutes is definitely super OK. However, on CI, I see that it takes over a hour? E.g; https://github.com/ethereumjs/ethereumjs-monorepo/actions/runs/5528727301/jobs/10085874812?pr=2816

jochem-brouwer avatar Jul 12 '23 09:07 jochem-brouwer