nav icon indicating copy to clipboard operation
nav copied to clipboard

Rework docker development environment

Open lunkwill42 opened this issue 11 months ago • 16 comments

This PR aims to do several things to the docker-compose based development environment:

  • Removed redundant build commands that make things take a long time
  • Switch to installing everything as an unprivileged user into a virtualenv inside the container. Running fewer things as root is better, and we can take advantage of a single pip cache for reduced install times. Also, at some point we will need to move to Debian Bookworm, which will deny pip from installing system-level packages, so a virtualenv will be necessary anyway.
  • Maps the unprivileged nav user to a UID/GID pair at build time, so we don't need to have strange entrypoint magic to dynamically switch the nav users uid/gid every time the container starts.
  • Adds cache mounts to Dockerfile in an attempt to speed up image rebuild times.

Docs are updated. Essential difference for developers is that a UID/GID value now needs to be passed when the container images are built, but this can be automated and reduced to a call to make docker.

lunkwill42 avatar Mar 04 '24 14:03 lunkwill42

Codecov Report

Attention: Patch coverage is 0% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 56.69%. Comparing base (9cfd877) to head (59185f3). Report is 7 commits behind head on master.

Files Patch % Lines
python/nav/startstop.py 0.00% 1 Missing :warning:
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2859   +/-   ##
=======================================
  Coverage   56.69%   56.69%           
=======================================
  Files         602      602           
  Lines       43971    43971           
=======================================
  Hits        24931    24931           
  Misses      19040    19040           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 04 '24 14:03 codecov[bot]

Test results

     12 files       12 suites   11m 53s :stopwatch: 3 320 tests 3 320 :heavy_check_mark: 0 :zzz: 0 :x: 9 435 runs  9 435 :heavy_check_mark: 0 :zzz: 0 :x:

Results for commit 59185f35.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Mar 04 '24 14:03 github-actions[bot]

Let me know if you can reproduce it. If not, I will provide a more thorough bug report.

Not reproducible for me.

lunkwill42 avatar Mar 06 '24 14:03 lunkwill42

Let me know if you can reproduce it. If not, I will provide a more thorough bug report.

Not reproducible for me.

Describe the bug

With changes in this PR, accessing the Netmap tool in the web leads to an alert pop-up and an error message in the browser console. No graph is loaded.

To Reproduce

Steps to reproduce the behavior:

  1. Check out https://github.com/Uninett/nav/pull/2859/commits/c4a0510dd6960f212c03d2294d0f13b957f8119b. Run make docker and then docker compose up.
  2. Go to /netmap
  3. See an alert pop-up with message "Error loading graph, please try to reload the page".
  4. Open Console in browser dev tools and see an error message "GET http://localhost/netmap/graph/layer2/3/?_=1709804282265 500 (Internal Server Error)"
  5. See that no graph has been loaded

Expected behavior

Netmap tool in web works without prompting alerts or error messages, and the graph is loaded properly with all the nodes.

Screenshots

Screenshot 2024-03-06 at 09 49 32

Tracebacks

Logs from the web container:

2024-03-07 10:43:21 Internal Server Error: /netmap/graph/layer2/3/
2024-03-07 10:43:21 Traceback (most recent call last):
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner
2024-03-07 10:43:21     response = get_response(request)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/core/handlers/base.py", line 181, in _get_response
2024-03-07 10:43:21     response = wrapped_callback(request, *callback_args, **callback_kwargs)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
2024-03-07 10:43:21     return view_func(*args, **kwargs)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/views/generic/base.py", line 70, in view
2024-03-07 10:43:21     return self.dispatch(request, *args, **kwargs)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 509, in dispatch
2024-03-07 10:43:21     response = self.handle_exception(exc)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 469, in handle_exception
2024-03-07 10:43:21     self.raise_uncaught_exception(exc)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
2024-03-07 10:43:21     raise exc
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 506, in dispatch
2024-03-07 10:43:21     response = handler(request, *args, **kwargs)
2024-03-07 10:43:21   File "/source/python/nav/web/netmap/api.py", line 195, in get
2024-03-07 10:43:21     return Response(get_topology_graph(layer, load_traffic, view))
2024-03-07 10:43:21   File "/source/python/nav/web/netmap/graph.py", line 51, in get_topology_graph
2024-03-07 10:43:21     return _json_layer2(load_traffic, view=view)
2024-03-07 10:43:21   File "/source/python/nav/web/netmap/cache.py", line 70, in get_traffic
2024-03-07 10:43:21     cached = cache.get(cache_key)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/core/cache/backends/filebased.py", line 34, in get
2024-03-07 10:43:21     with open(fname, 'rb') as f:
2024-03-07 10:43:21 PermissionError: [Errno 13] Permission denied: '/tmp/nav_cache/2c0563c9a0882809f1fd36a93fb58c3b.djcache'
2024-03-07 10:43:21 [Thu Mar 07 10:43:21 2024] [ERROR] [pid=63 django.request] Internal Server Error: /netmap/graph/layer2/3/
2024-03-07 10:43:21 Traceback (most recent call last):
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner
2024-03-07 10:43:21     response = get_response(request)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/core/handlers/base.py", line 181, in _get_response
2024-03-07 10:43:21     response = wrapped_callback(request, *callback_args, **callback_kwargs)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
2024-03-07 10:43:21     return view_func(*args, **kwargs)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/views/generic/base.py", line 70, in view
2024-03-07 10:43:21     return self.dispatch(request, *args, **kwargs)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 509, in dispatch
2024-03-07 10:43:21     response = self.handle_exception(exc)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 469, in handle_exception
2024-03-07 10:43:21     self.raise_uncaught_exception(exc)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
2024-03-07 10:43:21     raise exc
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/rest_framework/views.py", line 506, in dispatch
2024-03-07 10:43:21     response = handler(request, *args, **kwargs)
2024-03-07 10:43:21   File "/source/python/nav/web/netmap/api.py", line 195, in get
2024-03-07 10:43:21     return Response(get_topology_graph(layer, load_traffic, view))
2024-03-07 10:43:21   File "/source/python/nav/web/netmap/graph.py", line 51, in get_topology_graph
2024-03-07 10:43:21     return _json_layer2(load_traffic, view=view)
2024-03-07 10:43:21   File "/source/python/nav/web/netmap/cache.py", line 70, in get_traffic
2024-03-07 10:43:21     cached = cache.get(cache_key)
2024-03-07 10:43:21   File "/opt/venvs/nav/lib/python3.9/site-packages/django/core/cache/backends/filebased.py", line 34, in get
2024-03-07 10:43:21     with open(fname, 'rb') as f:
2024-03-07 10:43:21 PermissionError: [Errno 13] Permission denied: '/tmp/nav_cache/2c0563c9a0882809f1fd36a93fb58c3b.djcache'
2024-03-07 10:43:21 [07/Mar/2024 10:43:21] "GET /netmap/graph/layer2/3/?_=1709804601074 HTTP/1.1" 500 105474

Environment (please complete the following information):

  • OS: macOS Sonoma 14.3.1
  • Browser: Firefox, Chrome, Safari, Vivaldi

Additional context

MacBook with Apple chip M2.

podliashanyk avatar Mar 07 '24 09:03 podliashanyk

2024-03-07 10:43:21 PermissionError: [Errno 13] Permission denied: '/tmp/nav_cache/2c0563c9a0882809f1fd36a93fb58c3b.djcache'

Still not reproducible for me But, you could be re-using an outdated cache mount, @podliashanyk ? Have you completely destroyed your docker compose environment and removed the associated volumes? Specifically, nav_nav_cache needs to go:

docker volume rm nav_nav_cache

lunkwill42 avatar Mar 07 '24 10:03 lunkwill42

Specifically, these volumes are defined for re-use in docker-compose.yml. It sometimes behooves one to remove them entirely when rebuilding from scratch:

https://github.com/Uninett/nav/blob/c4a0510dd6960f212c03d2294d0f13b957f8119b/docker-compose.yml#L107-L111

lunkwill42 avatar Mar 07 '24 10:03 lunkwill42

docker volume rm nav_nav_cache

Sounds like a good candidate for a "make nuke"-rule.

hmpf avatar Mar 07 '24 11:03 hmpf

I also cannot reproduce the netmap bug and the /watchdog/ page also loads normally for me

johannaengland avatar Mar 07 '24 11:03 johannaengland

2024-03-07 10:43:21 PermissionError: [Errno 13] Permission denied: '/tmp/nav_cache/2c0563c9a0882809f1fd36a93fb58c3b.djcache'

Still not reproducible for me But, you could be re-using an outdated cache mount, @podliashanyk ? Have you completely destroyed your docker compose environment and removed the associated volumes? Specifically, nav_nav_cache needs to go:

docker volume rm nav_nav_cache

Just in case I have also docker pruned everything related to NAV containers, and rebuilt everything from scratch. The bug still appears. I noticed that the same happens with Watch Dog tool, except that /watchdog/ page doesn't load at all.

podliashanyk avatar Mar 07 '24 11:03 podliashanyk

Sounds like a good candidate for a "make nuke"-rule.

Sure, if you can find a volume nuke command that works in all cases. docker volume rm nav_nav_cache only works if you have exactly 1 NAV docker environment, and your checked out copy of NAV resides in a directory called nav. I have previously looked for docker compose-commands that would do the same but guarantee that it only works on objects related to the current context - I have been unsuccessful (blind, maybe?) thus far. Most importantly, we don't want make commands that potentially nuke unrelated things.

lunkwill42 avatar Mar 07 '24 12:03 lunkwill42

My make nuke involves docker system prune =) Nuke it from orbit, it's the only way to be sure!

hmpf avatar Mar 07 '24 12:03 hmpf

My make nuke involves docker system prune =) Nuke it from orbit, it's the only way to be sure!

Yeah, I don't think we should provide that kind of shoot-yourself-in-the-foot service to just anyone. If you know how to use it, you can do it yourself :laughing:

lunkwill42 avatar Mar 07 '24 13:03 lunkwill42

I just pushed updated docs and a slightly altered method for getting the UID/GID into the images - which hopefully reduces the hassle.

Also, I realized that the nuke command you may be looking for is docker compose down --volumes

lunkwill42 avatar Mar 07 '24 13:03 lunkwill42

Bug in Netmap. When I go to Netmap tool on web, an alert pops up. There is a 500 Internal Server Error on GET .../netmap/grapgh/layer2/....

Fresh install of NAV, nothing added to seeddb.

If I visit /netmap/ I get an empty graph, since seeddb is empty.

If I visit /netmap/graph/layer2/ I get the friendly DRF view of the API and no errors.

hmpf avatar Mar 08 '24 09:03 hmpf

Let me know if you can reproduce it. If not, I will provide a more thorough bug report.

Not reproducible for me.

Describe the bug

With changes in this PR, accessing the Netmap tool in the web leads to an alert pop-up and an error message in the browser console. No graph is loaded.

...

Environment (please complete the following information):

  • OS: macOS Sonoma 14.3.1
  • Browser: Firefox, Chrome, Safari, Vivaldi

Additional context

MacBook with Apple chip M2.

Resolved in https://github.com/Uninett/nav/pull/2859/commits/ffa18e26c69d75288c7c65003c6511d894f30f55 (co-authored with @lunkwill42)

podliashanyk avatar Mar 08 '24 10:03 podliashanyk