Keyan Chen

Results 10 issues of Keyan Chen

I found when I set CUDA_VISIBLE_DEVICES="1", The Code will terminal at self.model = CustomCLIP(cfg, classnames, clip_model) self.model.to(self.device). The to(self.device) function will wait a long time, and will not step over.

refer to https://github.com/KyanChen/MakeMultiHeadNaive/tree/master for help!

### What is the feature? 在runner中未对默认sampler进行定义,当config文件的dataloader不含有sampler时,会自动设置sampler为None,使用torch Dataloader默认的sampler。在单卡时不会出现问题,在多卡时会使得每张卡中的训练数据一样,显示的Epoch数也会相应出现问题。 建议加入默认Sampler的设定。 见https://github.com/open-mmlab/mmengine/blob/6c5eebb823e3c9381d63fd0cd1873ed1bd9ee9de/mmengine/runner/runner.py#L1396C34-L1396C34 ### Any other context? _No response_

### Prerequisite - [X] I have searched [Issues](https://github.com/open-mmlab/mmengine/issues) and [Discussions](https://github.com/open-mmlab/mmengine/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmengine). ### Environment...

bug

### What is the feature? 在出现模块或模型未注册时,一般是缺少注册器或者未加入init文件中。少数情况是缺少importlib_metadata包。 建议加入更详细的指引 详见:https://github.com/open-mmlab/mmengine/blob/6c5eebb823e3c9381d63fd0cd1873ed1bd9ee9de/mmengine/registry/registry.py#L286C1-L304C51 中的try,可能import_module缺少完整的python包,而失败 ### Any other context? _No response_

### What is the feature? 希望能支持 IterableDataset 或者 webdataset 的处理方式。随着训练数据越来越大,采用基于webdataset的数据读取方式越来越重要。现有的基于map-style的dataset方式很容易卡io,使得GPU的性能无法完全发挥。绝大部分时间都在索引和读取训练数据。 ### Any other context? _No response_

### Describe the problem Please implement a functionality in both the command prompt (cmd) and the shell environment that allows for automatic resource release in the event of a timeout...

feature

### Describe the problem how to delete the shell task logs? ### Describe the solution you'd like how to delete the shell task logs? ### Describe alternatives you've considered _No...

feature

### Describe the problem how to use docker images offline in local cluster. I know that a slurm based cluster can use cached docker images. ### Describe the solution you'd...

feature

### Describe the bug [89ab6926] crashed: task failed without an associated exit code: pulling container image: error parsing image name /mnt/jfs/singularity_image_root/determinedai/environments:cuda-11.8-pytorch-2.0-gpu-mpi-0.31.1: invalid reference format [2024-06-23 11:01:33] || ERROR: Trial 10...

bug