DeepRule
DeepRule copied to clipboard
torch problem
my cuda version is 11.7, but cuda version is 8.0 in DeepRule.txt, could i download 11.7?
I have add the new environment file see updates and is able to complie cpools layers
thank you very much!
hello, requirement-2023.txt have 33 packages, but DeepRule.txt have 96 packages, other packages not need download?
Generally not I have tested it, if found someone is missing, just install it.
Hello, I am training a model using "linedata(1028)" and encountered two errors. Could you please help me? 1、DeepRule-master/models/py_utils/kp_utils.py:592: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at ../aten/src/ATen/native/cuda/Indexing.cu:1239.) tag_full[1-mask_full] = 0 2、python3.9/site-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead. warnings.warn(warning.format(ret)) Segmentation fault (core dumped)
For first I think you can use type_as to torch.float32 before the masked_fill_ command
Thank you, after fixing all the UserWarning errors, I encountered the error "Segmentation fault (core dumped)" during the execution. Here is my execution process. Can you please explain why this is happening?
(DeepRule) sun@sun:~/DeepRule-master$ python train_chart.py --cfg_file CornerNetLine --data_dir "/home/sun/data/linedata(1028)" --cache_path "/home/sun/data/linedata(1028)/cache/"
Sounds like the Cornernet package problem. Follow the instructions to compile it.
Hello, after recompiling, the same problem still persists. Could you please provide the versions of Python, CUDA, and GCC specified in the requirements-2023.txt file? Additionally, I would like to know the amount of GPU memory required for training "line" model.
Package Version
adal 1.2.7 argcomplete 2.1.2 azure-common 1.1.28 azure-core 1.27.1 azure-graphrbac 0.61.1 azure-mgmt-authorization 3.0.0 azure-mgmt-containerregistry 10.1.0 azure-mgmt-core 1.4.0 azure-mgmt-keyvault 10.2.2 azure-mgmt-resource 22.0.0 azure-mgmt-storage 21.0.0 azureml 0.2.7 azureml-core 1.52.0 backports.tempfile 1.0 backports.weakref 1.0.post1 bcrypt 4.0.1 certifi 2023.5.7 cffi 1.15.1 charset-normalizer 3.1.0 contextlib2 21.6.0 contourpy 1.0.5 cryptography 41.0.1 cycler 0.11.0 docker 6.1.3 fonttools 4.25.0 h5py 3.8.0 humanfriendly 10.0 idna 3.4 importlib-resources 5.2.0 isodate 0.6.1 jeepney 0.8.0 jmespath 1.0.1 jsonpickle 3.0.1 kiwisolver 1.4.4 knack 0.10.1 matplotlib 3.7.1 mkl-fft 1.3.6 mkl-random 1.2.2 mkl-service 2.4.0 msal 1.22.0 msal-extensions 1.0.0 msrest 0.7.1 msrestazure 0.6.4 munkres 1.1.4 ndg-httpsclient 0.5.1 numpy 1.24.3 oauthlib 3.2.2 opencv-python 4.7.0.72 packaging 23.0 pandas 2.0.3 paramiko 3.2.0 pathspec 0.11.1 Pillow 9.4.0 pip 23.0.1 pkginfo 1.9.6 ply 3.11 portalocker 2.7.0 pyasn1 0.5.0 pycparser 2.21 Pygments 2.15.1 PyJWT 2.7.0 PyNaCl 1.5.0 pyOpenSSL 23.2.0 pyparsing 3.0.9 PyQt5-sip 12.11.0 PySocks 1.7.1 python-dateutil 2.8.2 pytz 2023.3 PyYAML 6.0 requests 2.30.0 requests-oauthlib 1.3.1 SecretStorage 3.3.3 setuptools 66.0.0 sip 6.6.2 six 1.16.0 tabulate 0.9.0 toml 0.10.2 torch 1.7.1+cu110 torchaudio 0.7.2 torchvision 0.8.2+cu110 tornado 6.2 typing_extensions 4.5.0 tzdata 2023.3 urllib3 1.26.16 websocket-client 1.6.1 wheel 0.38.4 I am able to run the train code
Thank you for your response. There is no information available here regarding Python, CUDA, and GCC, which could be due to different versions. Could you please provide me with the relevant information?
ysh @.***
------------------ 原始邮件 ------------------ 发件人: "soap117/DeepRule" @.>; 发送时间: 2023年7月1日(星期六) 中午11:27 @.>; @.@.>; 主题: Re: [soap117/DeepRule] torch problem (Issue #32)
Package Version
adal 1.2.7 argcomplete 2.1.2 azure-common 1.1.28 azure-core 1.27.1 azure-graphrbac 0.61.1 azure-mgmt-authorization 3.0.0 azure-mgmt-containerregistry 10.1.0 azure-mgmt-core 1.4.0 azure-mgmt-keyvault 10.2.2 azure-mgmt-resource 22.0.0 azure-mgmt-storage 21.0.0 azureml 0.2.7 azureml-core 1.52.0 backports.tempfile 1.0 backports.weakref 1.0.post1 bcrypt 4.0.1 certifi 2023.5.7 cffi 1.15.1 charset-normalizer 3.1.0 contextlib2 21.6.0 contourpy 1.0.5 cryptography 41.0.1 cycler 0.11.0 docker 6.1.3 fonttools 4.25.0 h5py 3.8.0 humanfriendly 10.0 idna 3.4 importlib-resources 5.2.0 isodate 0.6.1 jeepney 0.8.0 jmespath 1.0.1 jsonpickle 3.0.1 kiwisolver 1.4.4 knack 0.10.1 matplotlib 3.7.1 mkl-fft 1.3.6 mkl-random 1.2.2 mkl-service 2.4.0 msal 1.22.0 msal-extensions 1.0.0 msrest 0.7.1 msrestazure 0.6.4 munkres 1.1.4 ndg-httpsclient 0.5.1 numpy 1.24.3 oauthlib 3.2.2 opencv-python 4.7.0.72 packaging 23.0 pandas 2.0.3 paramiko 3.2.0 pathspec 0.11.1 Pillow 9.4.0 pip 23.0.1 pkginfo 1.9.6 ply 3.11 portalocker 2.7.0 pyasn1 0.5.0 pycparser 2.21 Pygments 2.15.1 PyJWT 2.7.0 PyNaCl 1.5.0 pyOpenSSL 23.2.0 pyparsing 3.0.9 PyQt5-sip 12.11.0 PySocks 1.7.1 python-dateutil 2.8.2 pytz 2023.3 PyYAML 6.0 requests 2.30.0 requests-oauthlib 1.3.1 SecretStorage 3.3.3 setuptools 66.0.0 sip 6.6.2 six 1.16.0 tabulate 0.9.0 toml 0.10.2 torch 1.7.1+cu110 torchaudio 0.7.2 torchvision 0.8.2+cu110 tornado 6.2 typing_extensions 4.5.0 tzdata 2023.3 urllib3 1.26.16 websocket-client 1.6.1 wheel 0.38.4 I am able to run the train code
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Hello, my GPU is relatively small, so I modified train.json and val.json files to keep only 10 data entries for testing purposes. However, when it reaches the line "training = pinned_training_queue.get(block=True)", the execution gets stuck and does not proceed. Below is my execution process. Can you please tell me the reason for this?
/home/ubuntu/anaconda3/envs/myenv/bin/python /home/ubuntu/download/pycharm-community-2023.1.4/plugins/python-ce/helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client 127.0.0.1 --port 44227 --file /media/ubuntu/A4823F1E823EF480/2023/env/python/DeepRule-master-weixiugai/DeepRule-master/train_chart.py Connected to pydev debugger (build 231.9225.15) /home/ubuntu/anaconda3/envs/myenv/lib/python3.6/site-packages/OpenSSL/_util.py:6: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography. The next release of cryptography will remove support for Python 3.6. from cryptography.hazmat.bindings.openssl.binding import Binding Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.scriptrun = azureml.core.script_run:ScriptRun._from_run_dto with exception (pyOpenSSL 23.2.0 (/home/ubuntu/anaconda3/envs/myenv/lib/python3.6/site-packages), Requirement.parse('pyopenssl<23.0.0')). /media/ubuntu/A4823F1E823EF480/2023/env/python/DeepRule-master-weixiugai/DeepRule-master/train_chart.py:22: FutureWarning: azureml.core: AzureML support for Python 3.6 is deprecated and will be dropped in an upcoming release. At that point, existing Python 3.6 workflows that use AzureML will continue to work without modification, but Python 3.6 users will no longer get access to the latest AzureML features and bugfixes. We recommend that you upgrade to Python 3.7 or newer. To disable SDK V1 deprecation warning set the environment variable AZUREML_DEPRECATE_WARNING to 'False' from azureml.core.run import Run ['line'] loading all datasets... using 1 threads loading from cache file: /media/ubuntu/A4823F1E823EF480/2023/env/python/linedata(1028)/line/line_train2019.pkl loading annotations into memory... /media/ubuntu/A4823F1E823EF480/2023/env/python/linedata(1028)/line/annotations/instancesLine(1023)_train2019.json Done (t=0.00s) creating index... index created! loading from cache file: /media/ubuntu/A4823F1E823EF480/2023/env/python/linedata(1028)/line/line_val2019.pkl loading annotations into memory... /media/ubuntu/A4823F1E823EF480/2023/env/python/linedata(1028)/line/annotations/instancesLine(1023)_val2019.json Done (t=0.00s) creating index... index created! system config... {'batch_size': 5, 'cache_dir': '/media/ubuntu/A4823F1E823EF480/2023/env/python/linedata(1028)/line', 'chunk_sizes': [5, 7, 7, 7], 'config_dir': './config', 'data_dir': '/media/ubuntu/A4823F1E823EF480/2023/env/python/linedata(1028)', 'data_rng': RandomState(MT19937) at 0x7FCC248FF258, 'dataset': 'Line', 'decay_rate': 10, 'display': 5, 'learning_rate': 0.01, 'max_iter': 50000, 'nnet_rng': RandomState(MT19937) at 0x7FCC248FF570, 'opt_algo': 'adam', 'prefetch_size': 5, 'pretrain': None, 'result_dir': './results', 'sampling_function': 'kp_detection', 'snapshot': 5000, 'snapshot_name': 'CornerNetLine', 'stepsize': 45000, 'tar_data_dir': 'cls', 'test_split': 'testchart', 'train_split': 'trainchart', 'val_iter': 100, 'val_split': 'valchart', 'weight_decay': False, 'weight_decay_rate': 1e-05, 'weight_decay_type': 'l2'} db config... {'ae_threshold': 0.5, 'border': 128, 'categories': 1, 'data_aug': True, 'gaussian_bump': True, 'gaussian_iou': 0.3, 'gaussian_radius': -1, 'input_size': [511, 511], 'lighting': True, 'max_per_image': 100, 'merge_bbox': False, 'nms_algorithm': 'exp_soft_nms', 'nms_kernel': 3, 'nms_threshold': 0.5, 'output_sizes': [[128, 128]], 'rand_color': True, 'rand_crop': True, 'rand_pushes': False, 'rand_samples': False, 'rand_scale_max': 1.4, 'rand_scale_min': 0.6, 'rand_scale_step': 0.1, 'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]), 'special_crop': False, 'test_scales': [1], 'top_k': 200, 'weight_exp': 8} len of db: 11 building model... module_file: models.CornerNetLine use kp total parameters: 198592138 setting learning rate to: 0.01 training start... start prefetching data... ['read.txt'] 0%| | 0/50000 [00:00<?, ?it/s]
I am currently facing a simmilar issue. Did you manage to find a soultion to this?