lecture-jax icon indicating copy to clipboard operation
lecture-jax copied to clipboard

[multiple_lectures] use correct timing

Open shlff opened this issue 10 months ago • 11 comments

This PR fix #163.

shlff avatar Apr 06 '24 04:04 shlff

Deploy Preview for incomparable-parfait-2417f8 ready!

Name Link
Latest commit c74f335cf181493f6f331028c646457fb9a88a70
Latest deploy log https://app.netlify.com/sites/incomparable-parfait-2417f8/deploys/6661385a0e7ca300087fa79b
Deploy Preview https://deploy-preview-175--incomparable-parfait-2417f8.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify[bot] avatar Apr 06 '24 04:04 netlify[bot]

🚀 Deployed on https://66614b337812b555721a976e--incomparable-parfait-2417f8.netlify.app

github-actions[bot] avatar Apr 06 '24 05:04 github-actions[bot]

@shlff is this still in-work?

mmcky avatar May 07 '24 06:05 mmcky

Thanks for the reminder @mmcky .

  • [x] There is a merge conflict, so I will resolve it first.
  • [x] Then I will recheck this PR and let you know if it is ready for review.

shlff avatar May 07 '24 06:05 shlff

@shlff , did you finish the remaining tasks? Could you please review this and then pass it on to @mmcky to merge?

@mmcky Please merge when ready. Once it's done I'll start submitting some PRs based on improvements to the JAX lectures made while I was in Chile.

jstac avatar May 18 '24 17:05 jstac

Sure @jstac .

Hi @mmcky, I have reviewed this PR, and it is ready for you to review and decide whether to merge it once the Building Project on Google Collab check is finalized.

It would be great if you can have a look at this check.

(I don't know why this step is still in the queue. It failed once before, and I don't think my changes will cause it to fail.)

shlff avatar May 19 '24 11:05 shlff

Hi @mmcky this PR involves several changes across many lectures.

To make it easier for you to review, I will summarize what this PR does:

  • [x] Use cell/line time magic %%time or %time for all Jax lectures whenever time comparison is not necessary
  • [x] Use time() from package time for all Jax lectures whenever time comparison is necessary
  • [x] Run the timed code cell/line twice to strip out the compilation time whenever necessary
  • [x] Add a sentence to clarify the repetition between all the above-mentioned code cells/lines
  • [x] Add .block_until_ready() to the end of all the timed Jax functions whenever possible

Furthermore,

  • [x] Eliminate unnecessary empty code lines
  • [x] Use numba.jit or numba.njit for Numba code, to be better aligned with the convention of jax.jit
  • [x] Correct some typos across different lectures

shlff avatar May 19 '24 11:05 shlff

There is a weird observation in the lecture cake_eating_numerical that the execution time of Jax code is slower than that of the Numba code after both eliminating the compilation time:

https://6649c5ca0d17a892895c2dc8--incomparable-parfait-2417f8.netlify.app/cake_eating_numerical

It might be the case that they are solving different problems.

I suggest we create another issue to mention this one, and we should add sanity checks for all such lectures to make the horse racing fair.

shlff avatar May 19 '24 11:05 shlff

Thanks @shlff for catching this. @mmcky , while adding an issue could you please also pull down the cake eating lecture? We don't want it live with incorrect results.

jstac avatar May 19 '24 17:05 jstac

Thanks @mmcky .

Here is the error log jax_intro.err.log from Build Project on Google Collab check:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/jupyter_cache/executors/utils.py", line 58, in single_nb_execution
    executenb(
  File "/usr/local/lib/python3.10/dist-packages/nbclient/client.py", line 1305, in execute
    return NotebookClient(nb=nb, resources=resources, km=km, **kwargs).execute()
  File "/usr/local/lib/python3.10/dist-packages/jupyter_core/utils/__init__.py", line 165, in wrapped
    return loop.run_until_complete(inner)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/dist-packages/nbclient/client.py", line 705, in async_execute
    await self.async_execute_cell(
  File "/usr/local/lib/python3.10/dist-packages/nbclient/client.py", line 1058, in async_execute_cell
    await self._check_raise_for_error(cell, cell_index, exec_reply)
  File "/usr/local/lib/python3.10/dist-packages/nbclient/client.py", line 914, in _check_raise_for_error
    raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
------------------
f_jit(x)
------------------


[0;31m---------------------------------------------------------------------------[0m
[0;31mXlaRuntimeError[0m                           Traceback (most recent call last)
[0;32m<ipython-input-48-81fa6809cbbb>[0m in [0;36m<cell line: 1>[0;34m()[0m
[0;32m----> 1[0;31m [0mf_jit[0m[0;34m([0m[0mx[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m
    [0;31m[... skipping hidden 15 frame][0m

[0;32m/usr/local/lib/python3.10/dist-packages/jax/_src/compiler.py[0m in [0;36mbackend_compile[0;34m(backend, module, options, host_callbacks)[0m
[1;32m    236[0m   [0;31m# TODO(sharadmv): remove this fallback when all backends allow `compile`[0m[0;34m[0m[0;34m[0m[0m
[1;32m    237[0m   [0;31m# to take in `host_callbacks`[0m[0;34m[0m[0;34m[0m[0m
[0;32m--> 238[0;31m   [0;32mreturn[0m [0mbackend[0m[0;34m.[0m[0mcompile[0m[0;34m([0m[0mbuilt_c[0m[0;34m,[0m [0mcompile_options[0m[0;34m=[0m[0moptions[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m    239[0m [0;34m[0m[0m
[1;32m    240[0m def compile_or_get_cached(

[0;31mXlaRuntimeError[0m: INTERNAL: ptxas exited with non-zero error code 139, output: : If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.

However, I can execute the corresponding code cell on my local machine.

shlff avatar May 20 '24 04:05 shlff

thanks @shlff it sounds like our cloud machine needs more hdd so I will look into this.

mmcky avatar May 20 '24 10:05 mmcky

@mmcky , I resolved the merging conflicts, and you can continue to review it.

shlff avatar May 24 '24 21:05 shlff

Thanks for your valuable comments @jstac and @mmcky .

I've made your suggested changes, and this PR is ready for review.

Here is a preview: https://66556bce6951f1a342bd1bbd--incomparable-parfait-2417f8.netlify.app/

Hi @jstac , according to @mmcky , the Build Project on Google Collab (Execution) / execution-checks (pull_request) check fails, probably because of insufficient hhd on the cloud machines.

So I reckon my code does not break check, which is confirmed by the successful deployment.

But it would be great if @mmcky could confirm this whenever you have time.

shlff avatar May 28 '24 05:05 shlff

@shlff so it seems the filesystem size is to correct. I did a bunch of testing and there is plenty of file system memory. It looks like this is closer to the main issue

https://stackoverflow.com/questions/69712084/jax-woes-on-an-nvdia-dgx-box-no-less

@jstac I am going to disable google collab testing on this repo and re-open a new PR for some additional testing.

mmcky avatar Jun 06 '24 02:06 mmcky

  • [x] remove google collab testing and open a new PR to reinstate (with new testing). It looks like the docker container for google collab isn't properly configured for nvidia drivers etc.

mmcky avatar Jun 06 '24 02:06 mmcky

  • [x] remove cake_eating_numercial if timing doesn't work in the preview (as per comment). This is captured in https://github.com/QuantEcon/lecture-jax/issues/185

mmcky avatar Jun 06 '24 02:06 mmcky

@jstac this is looking good for your review and merge.

mmcky avatar Jun 06 '24 05:06 mmcky

Thanks @shlff @mmcky , much appreciated.

jstac avatar Jun 06 '24 05:06 jstac

@shlff so it seems the filesystem size is to correct. I did a bunch of testing and there is plenty of file system memory. It looks like this is closer to the main issue

https://stackoverflow.com/questions/69712084/jax-woes-on-an-nvdia-dgx-box-no-less

@jstac I am going to disable google collab testing on this repo and re-open a new PR for some additional testing.

Thanks @mmcky and much appreciated. You did an excellent job to fix it!

shlff avatar Jun 06 '24 05:06 shlff

@jstac I am going to disable google collab testing on this repo and re-open a new PR for some additional testing.

I have an open PR #184. Let's see if it works fine to use the Github's resources instead of AWS.

kp992 avatar Jun 08 '24 12:06 kp992