envd
envd copied to clipboard
feat: expose interface for getting info at building
Description
We introduce a global module env
with a getter function os()
in starlark API
This can exposed some variables for subsequent building procedure, especially for some library function developing, they can automatically detect os version
or language
, and be compatible with them without user input these information again.
There could be more getters implementation in this module in the future.
Examples
def build():
base(os="ubuntu18.04", language="python3.8")
print(env.os())
# output "ubuntu18.04"
def build():
base(os="ubuntu20.04", language="python3.8")
print(env.os())
# output "ubuntu20.04"
PS. It would be better to be a lazy-evaluate variable env.os
for design, but starlark doesn't support lazy-evaluate!
ref: https://github.com/tensorchord/envdlib/issues/14
Message from the maintainers:
Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.
/assign
We had some discussions here https://github.com/tensorchord/envd/issues/1132#issuecomment-1308433521 about the interface. I think it's better to make it as a key-value interface instead of methods.
We had some discussions here #1132 (comment) about the interface. I think it's better to make it as a key-value interface instead of methods.
What's the advantage of key-value interface? I think this info is not static, thus a method would be suitable.
@kemingy So that every function can provide such information. For example envdlib can provide torch
as function. Contributor can also provide torch version in the function. Other packages relying on torch can get such information.
@kemingy So that every function can provide such information. For example envdlib can provide
torch
as function. Contributor can also provide torch version in the function. Other packages relying on torch can get such information.
Do you mean that it's also mutable and can be modified with assignment statements?
Yes. Something like env.get("os/version", "20.04")
, env.get("os/name", "ubuntu")
or env["package/torch"]="1.12.0"
env.get(key, default_value)
. The same syntax as python dict
I like this way, it could act as a hack of lazy-evaluate, if the os
is a value, it would be more pythonic.
def get_os():
info_dict = get_build_info()
return info_dict["os"]
As it actually convey a copy/value of graph, not reference of graph, people may mistakenly think they can write the graph.
Apart from that, we couldn't provide the exact graph, but only parts of it and flatten one, as starlark don't support class/struct.
Rather than named get_build_graph
, get_build_status/get_build_data/get_build_info
might be better.
As it is a dict of fixed keys, We can use TypedDict for Python type hint:
from typing import TypedDict
class Movie(TypedDict):
name: str
year: int
movie: Movie = {'name': 'Blade Runner',
'year': 1982}
ref: https://peps.python.org/pep-0589/
@cutecutecat Actually I think it's fine for the users to write into the key-value graph. For example, some library will depends on tensorrt version, and tensorrt is only provided in envdlib. User may want to check the tensorrt version when installing other libraries
@cutecutecat Actually I think it's fine for the users to write into the key-value graph. For example, some library will depends on tensorrt version, and tensorrt is only provided in envdlib. User may want to check the tensorrt version when installing other libraries
@VoVAllen @kemingy
If we support write
to build graph and let it build, than user or envdlib developer are able to overwrite the key set in build procedure, which destroys envd encapsulation, like:
def build():
base(os="ubuntu18.04", language="python3.8")
info_dict = get_build_graph()
info_dict["os"] = "centos6.5"
info_dict["language"] = "R"
If these setters occurs in different extensions, it is difficult to debug where the change happens.
I think we should only support read
to build graph, as it's primary element, could be revised by envd api only.
Obviously, for envdlib
developer to set some persistent variables, we could introduce another API, like environment variable
, it supports both read
and write
, and may work like this
env.set('tensorchord.envdlib.tensorrt.version', "1.0")
version = env.get('tensorchord.envdlib.tensorrt.version')
and the build graph API likes this:
os = env.graph.os()
There might need more discussion before proceeding this.
/unassign
Some thoughts:
- Override can be a feature instead of a bug. Just like dynamic language as Python, you can monkey patch any methods on your own. For now, I don't know whether we need to let users do this.
- To solve the overwrite problem, we can simply raise errors when the program writes to an existing key. Thus
env
can be a special dictionary, with some internal constraint. We can also add protection by prefixes such asenvd/os/version
, key starts withenvd/
is protected. The only positive point I can see aboutenv.os()
overenv['os']
is that method seems guaranteed to be accessible to the users, but the key might be missing in the dictionary. However, I still think the flexibility provided key-value like behavior is needed in this proposal.