Add docker container(s) to help run examples
Is your feature request related to a problem? Please describe. The friction to getting the examples up and running is installing the dependencies. A docker container with them already provided would reduce friction for people to get started with Hamilton.
Describe the solution you'd like
- A docker container, that has different python virtual environments, that has the dependencies to run the examples.
- The container has the hamilton repository checked out -- so it has the examples folder.
- Then using it would be:
- docker pull image
- docker start image
- activate python virtual environment
- run example
Describe alternatives you've considered Not doing this.
Additional context This was a request from a Hamilton talk.
Hi @skrawcz I am interested to work on this issue.
Hi @skrawcz I am interested to work on this issue.
Hi @bovem. That's great. Do you have an idea of what to do? Or do you need some more guidance and specifications?
Thanks @skrawcz . I will create a PR and ask you questions on the go, as they arrive.
Hey @skrawcz . I do have some queries
- Why is it required to have different virtual environments. Can I just create a consolidated requirements.txt and that should do the work?
- Are there any known dependency conflicts?
- Does it have to be any specific base container image?
- The environment should be for python2 or python3?
Hey @skrawcz . I do have some queries
- Why is it required to have different virtual environments. Can I just create a consolidated requirements.txt and that should do the work?
Yes in theory. But it's not guaranteed to always be true. Would prefer separate ones, since that will also be closer to how people would use Hamilton; they wouldn't have all spark, ray, dask dependencies installable if they're not using them.
- Are there any known dependency conflicts?
Not that I am aware of.
- Does it have to be any specific base container image?
Python3 - I think it's fine to target 3.8 or 3.9. Note, for spark, the container will also need java.
- The environment should be for python2 or python3?
Python 3 -- 3.8+
Thanks!
We should bump this up in priority -- since people without a python environment can't easily get started -- and docker might be a simpler solution for them to try Hamilton.
Hi @skrawcz I was able to create different virtual environments for the examples but I was facing some issues while running following examples. I also tried installing hamilton using pip install sf-hamilton inside the virtual environments but that didn't resolve the issue.
- async
root@4182a9717aaf:/hamilton/examples/async# source hamilton/bin/activate
(hamilton) root@4182a9717aaf:/hamilton/examples/async# uvicorn fastapi_example:app
Traceback (most recent call last):
File "/hamilton/examples/async/hamilton/bin/uvicorn", line 8, in <module>
sys.exit(main())
File "/hamilton/examples/async/hamilton/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/hamilton/examples/async/hamilton/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/hamilton/examples/async/hamilton/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/hamilton/examples/async/hamilton/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/hamilton/examples/async/hamilton/lib/python3.10/site-packages/uvicorn/main.py", line 408, in main
run(
File "/hamilton/examples/async/hamilton/lib/python3.10/site-packages/uvicorn/main.py", line 576, in run
server.run()
File "/hamilton/examples/async/hamilton/lib/python3.10/site-packages/uvicorn/server.py", line 60, in run
return asyncio.run(self.serve(sockets=sockets))
File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/hamilton/examples/async/hamilton/lib/python3.10/site-packages/uvicorn/server.py", line 67, in serve
config.load()
File "/hamilton/examples/async/hamilton/lib/python3.10/site-packages/uvicorn/config.py", line 479, in load
self.loaded_app = import_from_string(self.app)
File "/hamilton/examples/async/hamilton/lib/python3.10/site-packages/uvicorn/importer.py", line 24, in import_from_string
raise exc from None
File "/hamilton/examples/async/hamilton/lib/python3.10/site-packages/uvicorn/importer.py", line 21, in import_from_string
module = importlib.import_module(module_str)
File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/hamilton/examples/async/./fastapi_example.py", line 3, in <module>
from hamilton.experimental import h_async
ModuleNotFoundError: No module named 'hamilton.experimental'
- dask
root@4182a9717aaf:/hamilton/examples/dask# source hamilton/bin/activate
(hamilton) root@4182a9717aaf:/hamilton/examples/dask# cd hello_world/
(hamilton) root@4182a9717aaf:/hamilton/examples/dask/hello_world# python3 run.py
[INFO] 2022-10-06 04:42:43,407 __main__(24): LocalCluster(2fe1e048, 'tcp://127.0.0.1:46211', workers=4, threads=16, memory=14.97 GiB)
[INFO] 2022-10-06 04:42:44,077 __main__(50): spend signups avg_3wk_spend spend_per_signup spend_zero_mean_unit_variance
0 10 1 NaN 10.000 -1.064405
1 10 10 NaN 1.000 -1.064405
2 20 50 13.333333 0.400 -0.483821
3 40 100 23.333333 0.400 0.677349
4 40 200 33.333333 0.200 0.677349
5 50 400 43.333333 0.125 1.257934
2022-10-06 04:42:44,391 - distributed.client - ERROR -
ConnectionRefusedError: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/hamilton/examples/dask/hamilton/lib/python3.10/site-packages/distributed/comm/core.py", line 291, in connect
comm = await asyncio.wait_for(
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
return fut.result()
File "/hamilton/examples/dask/hamilton/lib/python3.10/site-packages/distributed/comm/tcp.py", line 503, in connect
convert_stream_closed_error(self, e)
File "/hamilton/examples/dask/hamilton/lib/python3.10/site-packages/distributed/comm/tcp.py", line 142, in convert_stream_closed_error
raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc
distributed.comm.core.CommClosedError: in <distributed.comm.tcp.TCPConnector object at 0x7ff29e4195d0>: ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/hamilton/examples/dask/hamilton/lib/python3.10/site-packages/distributed/utils.py", line 742, in wrapper
return await func(*args, **kwargs)
File "/hamilton/examples/dask/hamilton/lib/python3.10/site-packages/distributed/client.py", line 1246, in _reconnect
await self._ensure_connected(timeout=timeout)
File "/hamilton/examples/dask/hamilton/lib/python3.10/site-packages/distributed/client.py", line 1276, in _ensure_connected
comm = await connect(
File "/hamilton/examples/dask/hamilton/lib/python3.10/site-packages/distributed/comm/core.py", line 315, in connect
await asyncio.sleep(backoff)
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 605, in sleep
return await future
asyncio.exceptions.CancelledError
[ERROR] 2022-10-06 04:42:44,395 asyncio.events(768):
Traceback (most recent call last):
File "/hamilton/examples/dask/hamilton/lib/python3.10/site-packages/distributed/comm/tcp.py", line 225, in read
frames_nbytes = await stream.read_bytes(fmt_size)
tornado.iostream.StreamClosedError: Stream is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/hamilton/examples/dask/hamilton/lib/python3.10/site-packages/distributed/client.py", line 1443, in _handle_report
msgs = await self.scheduler_comm.comm.read()
File "/hamilton/examples/dask/hamilton/lib/python3.10/site-packages/distributed/comm/tcp.py", line 241, in read
convert_stream_closed_error(self, e)
File "/hamilton/examples/dask/hamilton/lib/python3.10/site-packages/distributed/comm/tcp.py", line 144, in convert_stream_closed_error
raise CommClosedError(f"in {obj}: {exc}") from exc
distributed.comm.core.CommClosedError: in <TCP (closed) Client->Scheduler local=tcp://127.0.0.1:38494 remote=tcp://127.0.0.1:46211>: Stream is closed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/hamilton/examples/dask/hamilton/lib/python3.10/site-packages/distributed/utils.py", line 742, in wrapper
return await func(*args, **kwargs)
File "/hamilton/examples/dask/hamilton/lib/python3.10/site-packages/distributed/client.py", line 1451, in _handle_report
await self._reconnect()
File "/hamilton/examples/dask/hamilton/lib/python3.10/site-packages/distributed/utils.py", line 742, in wrapper
return await func(*args, **kwargs)
File "/hamilton/examples/dask/hamilton/lib/python3.10/site-packages/distributed/client.py", line 1246, in _reconnect
await self._ensure_connected(timeout=timeout)
File "/hamilton/examples/dask/hamilton/lib/python3.10/site-packages/distributed/client.py", line 1276, in _ensure_connected
comm = await connect(
File "/hamilton/examples/dask/hamilton/lib/python3.10/site-packages/distributed/comm/core.py", line 315, in connect
await asyncio.sleep(backoff)
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 605, in sleep
return await future
asyncio.exceptions.CancelledError
I have also opened this PR: https://github.com/stitchfix/hamilton/pull/203 for the changes I've made.
OK, so I'm pretty sure I managed to debug the first at least -- there's a directory called hamilton inside it -- this is confusing python which thinks its a module, so its not finding experimental.
For the second, the pipeline runs succesfully, but it fails anyway. This is not docker-image-specific, and occurs in the main branch as well :/ I think its a failure in closing, but need to dig in further. Shouldn't be a blocker for you though.
Hope this helps!
Thanks @elijahbenizzy I changed the name of python virtual environment to hamilton-env and added sf-hamilton to the requirements.txt and I got this issue with the async example
(hamilton-env) root@99745e7d8c97:/hamilton/examples/async# uvicorn fastapi_example:app
Traceback (most recent call last):
File "/hamilton/examples/async/hamilton-env/bin/uvicorn", line 8, in <module>
sys.exit(main())
File "/hamilton/examples/async/hamilton-env/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/hamilton/examples/async/hamilton-env/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/hamilton/examples/async/hamilton-env/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/hamilton/examples/async/hamilton-env/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/hamilton/examples/async/hamilton-env/lib/python3.9/site-packages/uvicorn/main.py", line 408, in main
run(
File "/hamilton/examples/async/hamilton-env/lib/python3.9/site-packages/uvicorn/main.py", line 576, in run
server.run()
File "/hamilton/examples/async/hamilton-env/lib/python3.9/site-packages/uvicorn/server.py", line 60, in run
return asyncio.run(self.serve(sockets=sockets))
File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/hamilton/examples/async/hamilton-env/lib/python3.9/site-packages/uvicorn/server.py", line 67, in serve
config.load()
File "/hamilton/examples/async/hamilton-env/lib/python3.9/site-packages/uvicorn/config.py", line 479, in load
self.loaded_app = import_from_string(self.app)
File "/hamilton/examples/async/hamilton-env/lib/python3.9/site-packages/uvicorn/importer.py", line 24, in import_from_string
raise exc from None
File "/hamilton/examples/async/hamilton-env/lib/python3.9/site-packages/uvicorn/importer.py", line 21, in import_from_string
module = importlib.import_module(module_str)
File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/hamilton/examples/async/./fastapi_example.py", line 5, in <module>
from . import async_module
ImportError: attempted relative import with no known parent package
OK, so it works one level up:
[hamilton] bovem/examples (adding-dockerfile): uvicorn async.fastapi_example:app
INFO: Started server process [18090]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
This is probably due to the relative import -- sloppy on my part. Fixed in this PR: https://github.com/stitchfix/hamilton/pull/204. Mind rebasing?
Thanks, I rebased from your branch and I was able to run the async example but with this command uvicorn fastapi_example:app
Logs:
(hamilton-env) root@7736d3906b0b:/hamilton/examples/async# uvicorn fastapi_example:app
INFO: Started server process [1366]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
^CINFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [1366]
Now, all the examples are running except dask.
I think this PR: https://github.com/stitchfix/hamilton/pull/203 is ready for review. Should I change its state from draft?
Thanks @bovem - the dask one is related to how we shut dask down I believe. The dataframe is printed and it is correct, so I don't think there's an error per se. We should note this in the example, and make an issue to track it; otherwise will take a look at your PR this week. Thanks @bovem .
This was added in #209 . Closing this issue. Thanks @bovem !