mmcv icon indicating copy to clipboard operation
mmcv copied to clipboard

Unexpected behaviour `by_epoch` DvcliveLoggerHook/AnyLogger

Open rick-van-veen opened this issue 2 years ago • 3 comments

Thanks for reporting the unexpected results and we appreciate it a lot.

See also the DVClive issue: https://github.com/iterative/dvclive/issues/267

Describe the Issue

I was using the dvclive hook for mmcv and expected the by_epoch variable to mean something else then it is doing right now. I expected to get a result per epoch. However, it seems to have no (or not the expected) effect.

Reproduction

  1. What command, code, or script did you run? I added the following to my config.
log_config = dict(
    hooks=[
        dict(
            type="DvcliveLoggerHook",
            path="{{ fileDirname }}/../live",
            interval=1,
            by_epoch=True,
        ),
    ],
)
  1. Did you make any modifications on the code? Did you understand what you have modified?

I did not modify the code.

Environment

  1. Please run python -c "from mmcv.utils import collect_env; print(collect_env())" to collect necessary environment information and paste it here.
Output
{   
    'sys.platform': 'linux', 
    'Python': '3.8.8 (default, Feb 24 2021, 21: 46: 12) [GCC 7.3.0]',
    'CUDA available': True,
    'GPU 0, 1, 2, 3': 'NVIDIA TITAN RTX',
    'CUDA_HOME': '/usr/local/cuda',
    'NVCC': 'Cuda compilation tools, 
    release 11.2, V11.2.142',
    'GCC': 'gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0',
    'PyTorch': '1.9.0a0+df837d0', 
    'PyTorch compiling details': 'PyTorch built with:\n  - GCC 9.3\n  - C++ Version: 201402\n  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications\n  - Intel(R) MKL-DNN v1.7.0 (Git Hash N/A)\n  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n  - NNPACK is enabled\n  - CPU capability usage: AVX2\n  - CUDA Runtime 11.2\n  - NVCC architecture flags: -gencode;arch=compute_52,
    code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,
    code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,
    code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,
    code=sm_86;-gencode;arch=compute_86,code=compute_86\n  - CuDNN 8.1.1\n  - Magma 2.5.2\n  - Build settings: BLAS_INFO=mkl,
    BUILD_TYPE=Release,
    CUDA_VERSION=11.2,
    CUDNN_VERSION=8.1.1,
    CXX_COMPILER=/usr/bin/c++,
    CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow,
    FORCE_FALLBACK_CUDA_MPI=1,
    LAPACK_INFO=mkl,
    PERF_WITH_AVX=1,
    PERF_WITH_AVX2=1,
    PERF_WITH_AVX512=1,
    TORCH_VERSION=1.9.0,
    USE_CUDA=ON,
    USE_CUDNN=ON,
    USE_EXCEPTION_PTR=1,
    USE_GFLAGS=OFF,
    USE_GLOG=OFF,
    USE_MKL=ON,
    USE_MKLDNN=ON,
    USE_MPI=ON,
    USE_NCCL=ON,
    USE_NNPACK=ON,
    USE_OPENMP=ON,
    \n',
    'TorchVision': '0.9.0a0',
    'OpenCV': '3.4.11',
    'MMCV': '1.5.0',
    'MMCV Compiler': 'GCC 9.3',
    'MMCV CUDA Compiler': '11.2'
}
  1. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback If applicable, paste the error traceback here.

n/a

Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

n/a

rick-van-veen avatar Jul 14 '22 14:07 rick-van-veen

Hi, thanks for your feedback. Actually, except for TextLoggerHook, other logger hooks will only record training state by iterations, therefore your by_epoch will not influence your dvclive log.

HAOCHENYE avatar Jul 15 '22 09:07 HAOCHENYE

Hi! I am saying that's why it is 'unexpected'. So the thought in the initial issue I created over at iterative/dvclive (https://github.com/iterative/dvclive/issues/267) was that they could update the behaviour of by_epoch themselves but that this would diverge from the default behaviour of mmcv. However, based on your comment now, it seems this is what is done in TextLoggerHook and would not be an inconsistency?

Relevant also: What is the intended and actual use of by_epoch? Is TextLoggerHook the only one that is using it really?

rick-van-veen avatar Jul 15 '22 09:07 rick-van-veen

TextLoggerHook Handle the by_epoch attribute in https://github.com/open-mmlab/mmcv/blob/75ae2009545b086c8304d9d16150565ecbdc8565/mmcv/runner/hooks/logger/text.py#L163 Therefore you can control the iteration-based or epcoh-based logging format. Although this will work for the rest of the logger hooks as well, it requires a lot of modification. Currently, we have no plans to add this part of the logic to the rest of the logger hooks.

HAOCHENYE avatar Jul 16 '22 15:07 HAOCHENYE