Jake Schmidt
Jake Schmidt
Bugfixes
fixes: - buffers being on the wrong device - `MaskingGenerator.__repr__`'s `TypeError: %d format: a real number is required, not NoneType` - adds a missing argument (`in_chans`) - ignores `.DS_Store` files...
This PR aims to improve how parameter groups are formed and fused. Specifically, the changes are: - fuses param groups across all submodules of `self.student` - adds names to each...
Fixes breaking changes introduced by PyTorch 2.1. - https://github.com/pytorch/pytorch/pull/103902 - Closes https://github.com/facebookresearch/dinov2/issues/265
This PR attempts to decouple model training from FSDP, allowing more flexible distributed training e.g. with pytorch-lightning. IIUC we should be wrapping `Block`, not `BlockChunk` (which is the same as...
Looks like there are some breaking changes to the FSDP API in PyTorch 2.1. For example, `dinov2.fsdp.__init__.py::free_if_fsdp` is broken when using torch==2.1: `AttributeError: 'DinoVisionTransformer' object has no attribute '_handles'`
When training without providing the `mixed_precision` argument to FSDP, there is an error related to dtype mismatch in `dinov2/layers/block.py`. Is this expected? Full stacktrace: ```txt File "/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in...
Fixes #17912.
### Bug description `FSDPStrategy.load_checkpoint` casts `checkpoint_path` to a `pathlib.Path` [here](https://github.com/Lightning-AI/lightning/blob/master/src/lightning/pytorch/strategies/fsdp.py#L539). This will bork URIs, such as cloud checkpoint paths, e.g. `s3://...`. Example: ```python from pathlib import Path checkpoint_path = "s3://asd/asd"...
### Bug description When both of the following happen together: 1. a logger is used with a cloud (e.g `s3://` or `gcs://` protocol) save dir 2. a `ModelCheckpoint` is used...