nerfstudio icon indicating copy to clipboard operation
nerfstudio copied to clipboard

Breakup base.py config file

Open tancik opened this issue 1 year ago • 7 comments

The base.py config file has become to goto place to dump all config files. This should be refactored in a way to make it easier to work with.

Two options:

  • Place the configs in the same files as the classes they instantiate.
  • Create a set config files for each type class, ie dataset_configs.py, model_configs.py, ect.

tancik avatar Sep 01 '22 01:09 tancik

I'm a fan of the first option; to bring up an old issue though, one annoying thing is that the way _target is set up right now means putting configs in the same files as the classes they instantiate would require either putting the config after the class or using default_factory=:

@dataclass
class BlenderDatasetConfig(DatasetConfig):
    _target: Type = field(default_factory=lambda: Blender)  # We can't write `Blender` directly, because `Blender` doesn't exist yet
    data_directory: Path = Path("data/blender/lego")
    ...

class Blender:
    ...

Some potential fixes/alternatives, partially pasted from old conversations with Evonne+Ethan:

(1) @Config.register_target decorator

T = TypeVar("T", bound=Type)

class InstantiateConfig:
    def setup(...) -> ...: ...

    @classmethod
    def register_target(cls, model_cls: T) -> T:
        cls._target = model_cls
        return model_cls
@dataclasses.dataclass
class PipelineConfig(InstantiateConfig):
    ...

@PipelineConfig.register_target
class Pipeline:
    ...

(2) A typed alternative with typing.Generic

config.setup() currently always statically evaluates to Any. For typing config.setup() more strongly, we also have a mockup of an API that looks like:

@dataclasses.dataclass
class ModelConfig(
    InstantiateConfig["Model"]  # Quotes here indicate a forward reference.
):
    layers: int


class Model:
   ...

Using this we can get the .setup() method to statically evaluate to the expected type: image

(3) Getting rid of config objects entirely?

Slightly more radical, but by pushing some typing.ParamSpec objects around we could make an Args[SomeObject] type that inherits fields from the (type-annotated) constructor of SomeObject and type checks + autocompletes reasonably well.

A mockup:

image

brentyi avatar Sep 02 '22 08:09 brentyi

All of these options would be easy enough for us to add. The primary question is what would be easiest to anyone using our repo for the first time...

(3) is nice since the params are defined at the location of the class (and is what people normally do...) Do you think it would be possible to get something like option 3, but where we have some function can take in the class and return a Config like the code in lines 7-14?

tancik avatar Sep 02 '22 16:09 tancik

One thing with (3) that we'd lose if we got rid of the separate dataclasses is some inheritance succinctness: https://github.com/plenoptix/nerfactory/blob/6f3518d54f28d477e90c7d0b480891b675b97893/nerfactory/models/instant_ngp.py#L53-L58

Being able to just write self.config = config (as opposed to repeating self.config_name = config_name for every single configurable value) and pass the config directly to super().__init__() without worrying about the contents as currently done is pretty nice.

Do you think it would be possible to get something like option 3, but where we have some function can take in the class and return a Config like the code in lines 7-14?

Do you have an example of what this would look like/how it would be used? Not totally following this.

brentyi avatar Sep 02 '22 22:09 brentyi

Being able to just write self.config = config (as opposed to repeating self.config_name = config_name for every single configurable value) and pass the config directly to super().init() without worrying about the contents as currently done is pretty nice.

Hmm, yea that is a nice property. But aren't people pretty used to passing each configurable value? Also when it is just a config value being passed around, the docstrings and configurable values get dislocated from the class which actually uses them.

Do you have an example of what this would look like/how it would be used? Not totally following this.

I though more about this, it doesn't make sense, ignore ;)

tancik avatar Sep 03 '22 00:09 tancik

For co-locating docstrings and configurable values with the class that uses them, a pattern like this seems OK:

# model.py

@dataclass
class ModelConfig:
    num_layers: int
    “””Some docstring”””
    units: int
    “””Some other docstring”””

class Model:
    def __init__(self, config: ModelConfig):
        # use config.num_layers, config.units here

The docstring experience also might be a better experience:

  • When there are a lot of fields, dataclass documentation is less tedious than method documentation with the Args: thing where you have to re-list out the whole argument list and keep it synchronized.
  • If ModelConfig is a subclass, it’ll inherit docstrings for fields in the base class without extra effort (eg in IDE+autocomplete popups). If we get rid of the dataclass, documentation for arguments in superclasses, which is still relevant for subclasses, needs to be either omitted or repeated.

On Python inheritance in general, one thing I’m wondering about subclass constructors without dataclasses is if we’re forced to choose between:

  1. Listing out every single argument used by the base class constructor in the subclass constructor, then passing them in one-by-one via super().__init__(arg1=arg1, arg2=arg2, arg3=arg3, …).
  2. Or, alternatively, attaching a trailing **kwargs field to the subclass constructor, and calling super().__init__(**kwargs).’

Both seem difficult to scale; the first is terribly repetitive and the second can’t be type checked. Is there a better way to do this?

I generally like the (3) Args[Model] approach, just wouldn’t want to lose the succinctness + type friendly benefits of the current pattern.

brentyi avatar Sep 03 '22 06:09 brentyi

(4) We could also emulate Implicitron’s config pattern

They make every module a dataclass where the dataclass fields are the configurable values and the usual __init__ logic is moved into __post_init__; combined with Args[], it seems like this could get us type checking + autocomplete and nice inheritance syntax without auxiliary config classes.

Downsides would be:

  1. Slightly unconventional.
  2. Args[] (which we could also call Lazy[]? it’s just an object that’s defined but not actually instantiated) is much more magical + potentially difficult to understand than an explicit config dataclass.
  3. Maybe there are situations where having a config dataclass is actually really nice?

brentyi avatar Sep 03 '22 06:09 brentyi

Ok, so maybe for now we should just keep what we have but colocate the configs with the classes.

tancik avatar Sep 03 '22 18:09 tancik