snowfall icon indicating copy to clipboard operation
snowfall copied to clipboard

CUDA out of Memory Error

Open manu51188 opened this issue 4 years ago • 15 comments

Hi Team,

Thanks for giving us the features like K2 , Lhotse. They will be a big role in coming times. I am currently running the CTC based training framework using CSJ corpus and as soon as the training proceeds to batch number 3 or 4 in first epoch, cuda memory runs out and training process stops. Currently, the training framework is running on single cuda core. I have 11 GB of memory per cuda core.

Is it possible to run the training on multiple cuda cores ?

Regards, Mohit

manu51188 avatar Feb 22 '21 08:02 manu51188

Great! If you could help us create an example for CSJ corpus in Lhotse, it would be easier for us to help you debug. Right now we haven't fully debugged multiple CUDA cores (there is a hang). Dan

On Mon, Feb 22, 2021 at 4:48 PM manu51188 [email protected] wrote:

Hi Team,

Thanks for giving us the features like K2 , Lhotse. They will be a big role in coming times. I am currently running the CTC based training framework using CSJ corpus and as soon as the training proceeds to batch number 3 or 4 in first epoch, cuda memory runs out and training process stops. Currently, the training framework is running on single cuda core. I have 11 GB of memory per cuda core.

Is it possible to run the training on multiple cuda cores ?

Regards, Mohit

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/snowfall/issues/107, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO6C6PXTC7XUPTB2DBTTAIK4HANCNFSM4YAC3FZA .

danpovey avatar Feb 22 '21 10:02 danpovey

Hi,

Thanks for the reply. Yes surely I can create recipe for CSJ corpus in Lhotse similar to other receipes. I can see that duration in CSJ .wav files ranges from 600 sec to 5245 sec (quite long) and preparing the cutset from them. I think that will be a limiting factor for me.

Any idea how I can handle such varies length of recordings for training?

Regards, Mohit

manu51188 avatar Feb 22 '21 10:02 manu51188

I would assume that csj comes with some kind of segmentation information so you don't have to train on entire wav files?

On Mon, Feb 22, 2021 at 6:28 PM manu51188 [email protected] wrote:

Hi,

Thanks for the reply. Yes surely I can create recipe for CSJ corpus in Lhotse similar to other receipes. I can see that duration in CSJ .wav files ranges from 600 sec to 5245 sec (quite long) and preparing the cutset from them. I think that will be a limiting factor for me.

Any idea how I can handle such varies length of recordings for training?

Regards, Mohit

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/snowfall/issues/107#issuecomment-783270246, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO3IQU2A2EL3WNWL7Q3TAIWVTANCNFSM4YAC3FZA .

danpovey avatar Feb 22 '21 11:02 danpovey

As Dan said, there will typically be segmentation information (in Lhotse represented as SupervisionSet). After you create the CutSet, call .trim_to_supervisions() on it to have each cut represent a single segment.

pzelasko avatar Feb 22 '21 14:02 pzelasko

Hi @danpovey , @pzelasko ,

Thanks very much for the response. I will try trim_to_supervisions() to check the results.

Thanks and Regards, Mohit

manu51188 avatar Feb 23 '21 03:02 manu51188

Hi @pzelasko ,

I have added .train_to_supervison() to represent each cut a single segment. But unfortunately there is an assertion error in the validation error from the following lines : https://github.com/lhotsespeech/lhotse/blob/master/lhotse/dataset/speech_recognition.py#L122

assert (cut.start - 1e-5) <= supervision.start <= supervision.end <= (cut.end + 1e-5),
AssertionError: Cutting in the middle of a supervision is currently not supported for the ASR task. Cut ID violating the pre-condition: '218b7cc8-594f-4114-b2e7-67863db7f0ce'.

manu51188 avatar Feb 23 '21 14:02 manu51188

Can you find and print the cut with that ID (cut_set['218b7cc8-594f-4114-b2e7-67863db7f0ce'])? I wonder if this is a rounding error or something bigger.

pzelasko avatar Feb 23 '21 15:02 pzelasko

Hi @pzelasko,

Yes I printed the information for CutID ['218b7cc8-594f-4114-b2e7-67863db7f0ce']. Here is what I get,

"id": "218b7cc8-594f-4114-b2e7-67863db7f0ce",
"start": 0.32775,
"duration": 0.4866875,
"channel": 0,
"supervisions": [
  {
    "id": "R01M0278_0000295_0000733_sp0.9",
    "recording_id": "R01M0278_sp0.9",
    "start": 0.0,
    "duration": 0.4866875,
    "channel": 0,
    "text": " \u5b87\u5b99+\u540d\u8a5e",
    "speaker": "R01M0278"
  }
],
"features": {
  "type": "fbank",
  "num_frames": 9180,
  "num_features": 40,
  "frame_shift": 0.01,
  "sampling_rate": 16000,
  "start": 0.0,
  "duration": 91.80225,
  "storage_type": "lilcom_hdf5",
  "storage_path": "exp/data/fbank/train_all/feats-0.h5",
  "storage_key": "47fb1d58-d402-4925-a06a-eec4f460041b",
  "channels": 0
},
"recording": {
  "id": "R01M0278_sp0.9",
  "sources": [
    {
      "type": "file",
      "channels": [
        0
      ],
      "source": "/home/sysadmin/CSJ_RAW/WAV/noncore/R01M0278.wav"
    }
  ],
  "sampling_rate": 16000,
  "num_samples": 1468836,
  "duration": 91.80225,
  "transforms": [
    {
      "name": "Speed",
      "kwargs": {
        "factor": 0.9
      }
    }
  ]
},
"type": "Cut"

},

I am consistently finding the cutset is starting in the middle of supervision for other cutsets as well (most of them I say).

For example this one, { "id": "6f320bf1-2132-421b-9f7d-a8218acb7066", "start": 79.54275, "duration": 3.1881875, "channel": 0, "supervisions": [ { "id": "S05M0469_0087497_0091004_sp1.1", "recording_id": "S05M0469_sp1.1", "start": 0.0, "duration": 3.1881875, "channel": 0, "text": " \u79c1+\u4ee3\u540d\u8a5e \u304c+\u52a9\u8a5e/\u683c\u52a9\u8a5e \u5352\u696d+\u540d\u8a5e \u3059\u308b+\u52d5\u8a5e/\u30b5\u884c\u5909\u683c/\u9023\u4f53\u5f62 \u9803+\u540d\u8a5e \u304a\u30fc+\u611f\u52d5\u8a5e", "speaker": "S05M0469" } ], "features": { "type": "fbank", "num_frames": 73800, "num_features": 40, "frame_shift": 0.01, "sampling_rate": 16000, "start": 0.0, "duration": 737.9996875, "storage_type": "lilcom_hdf5", "storage_path": "exp/data/fbank/train_all/feats-5.h5", "storage_key": "b7cecc64-68d3-4c9a-83ca-f3a6ba720e64", "channels": 0 }, "recording": { "id": "S05M0469_sp1.1", "sources": [ { "type": "file", "channels": [ 0 ], "source": "/home/sysadmin/CSJ_RAW/WAV/noncore/S05M0469.wav" } ], "sampling_rate": 16000, "num_samples": 11807994, "duration": 737.999625, "transforms": [ { "name": "Speed", "kwargs": { "factor": 1.1 } } ] }, "type": "Cut" },

manu51188 avatar Feb 24 '21 07:02 manu51188

The cuts look correct to me. The „start” field has a different semantics in cut and in supervision: for cut, it is relative to the start of the recording; for the supervision, it is relative to the start of the cut. It seems the assertion in the K2SpeechRecognitionDataset is incorrect; I will commit a fix later today.

pzelasko avatar Feb 24 '21 13:02 pzelasko

Ok. Thanks.

I will try the training again after your fix.

Regards, Mohit

manu51188 avatar Feb 24 '21 14:02 manu51188

I merged it, please check out the latest Lhotse and try again:

pip uninstall lhotse
pip install git+https://github.com/lhotse-speech/lhotse

pzelasko avatar Feb 24 '21 15:02 pzelasko

Hi,

I have re-run the training and it still fails at the same validation check with following error :

Traceback (most recent call last): File "./ctc_train.py", line 408, in main() File "./ctc_train.py", line 275, in main train = K2SpeechRecognitionDataset( File "/home/sysadmin/miniconda3/lib/python3.8/site-packages/lhotse/dataset/speech_recognition.py", line 69, in init self._validate() File "/home/sysadmin/miniconda3/lib/python3.8/site-packages/lhotse/dataset/speech_recognition.py", line 123, in _validate assert supervision.start >= -tol, f"Supervisions starting before the cut are not supported for ASR"
AssertionError: Supervisions starting before the cut are not supported for ASR (sup id: A01M6710_0604926_0615368_sp1.1, cut id: 4fb131fa-272c-437b-a9bb-005671c25cb6)

and here is the cutset for id 4fb131fa-272c-437b-a9bb-005671c25cb6 :

"id": "4fb131fa-272c-437b-a9bb-005671c25cb6",
"start": 559.4254375,
"duration": 5.3045625,
"channel": 0,
"supervisions": [
  {
    "id": "A01M6710_0604926_0615368_sp1.1",
    "recording_id": "A01M6710_sp1.1",
    "start": -9.4926875,
    "duration": 9.49275,
    "channel": 0,
    "text": " \u3042\u306e\u30fc+\u611f\u52d5\u8a5e \u5f0a\u793e+\u540d\u8a5e/\u4e00\u822c \u306e+\u52a9\u8a5e/\u683c\u52a9\u8a5e \u3088\u3046+\u540d\u8a5e/\u975e\u81ea\u7acb/\u52a9\u52d5\u8a5e\u8a9e\u5e79 \u306b+\u52a9\u8a5e/\u526f\u8a5e\u5316 \u3053\u3046\u3044\u3046+\u9023\u4f53\u8a5e \u3044\u308d\u3093\u306a+\u9023\u4f53\u8a5e \u8077\u7a2e+\u540d\u8a5e/\u4e00\u822c \u304c+\u52a9\u8a5e/\u4e00\u822c/\u683c\u52a9\u8a5e \u8f09\u3063+\u52d5\u8a5e/\u81ea\u7acb/\u9023\u7528\u30bf\u63a5\u7d9a/\u4e94\u6bb5\u30fb\u30e9\u884c \u3066\u308b+\u52d5\u8a5e/\u975e\u81ea\u7acb/\u57fa\u672c\u5f62/\u4e00\u6bb5 \u30b5\u30a4\u30c8+\u540d\u8a5e/\u4e00\u822c \u3067+\u52a9\u8a5e/\u4e00\u822c/\u683c\u52a9\u8a5e \u30fc+\u540d\u8a5e/\u4e00\u822c/\u30fc <sp> \u63a1\u7528+\u540d\u8a5e/\u30b5\u5909\u63a5\u7d9a \u6210\u529f+\u540d\u8a5e/\u30b5\u5909\u63a5\u7d9a \u3057+\u52d5\u8a5e/\u81ea\u7acb/\u9023\u7528\u5f62/\u30b5\u5909\u30fb\u30b9\u30eb \u3066+\u52a9\u8a5e/\u63a5\u7d9a\u52a9\u8a5e \u3044\u308b+\u52d5\u8a5e/\u975e\u81ea\u7acb/\u57fa\u672c\u5f62/\u4e00\u6bb5 \u4f01\u696d+\u540d\u8a5e/\u4e00\u822c \u69d8+\u540d\u8a5e/\u63a5\u5c3e\u8f9e/\u4eba\u540d \u306f+\u52a9\u8a5e/\u4fc2\u52a9\u8a5e \u3069\u3046+\u526f\u8a5e/\u52a9\u8a5e\u985e\u63a5\u7d9a \u3057+\u52d5\u8a5e/\u81ea\u7acb/\u9023\u7528\u5f62/\u30b5\u5909\u30fb\u30b9\u30eb \u3066+\u52a9\u8a5e/\u63a5\u7d9a\u52a9\u8a5e \u3044\u308b+\u52d5\u8a5e/\u975e\u81ea\u7acb/\u57fa\u672c\u5f62/\u4e00\u6bb5 \u304b+\u52a9\u8a5e/\u526f\u52a9\u8a5e\uff0f\u4e26\u7acb\u52a9\u8a5e\uff0f\u7d42\u52a9\u8a5e \u3063\u3066+\u52a9\u8a5e/\u9023\u4f53\u5f62/\u683c\u52a9\u8a5e \u3068\u3053\u308d+\u540d\u8a5e/\u975e\u81ea\u7acb/\u526f\u8a5e \u3067\u3059+\u52a9\u52d5\u8a5e/\u57fa\u672c\u5f62/\u7279\u6b8a\u30fb\u30c7\u30b9 \u306d+\u52a9\u8a5e/\u7d42\u52a9\u8a5e <sp> \u3067+\u63a5\u7d9a\u8a5e \u30fc+\u540d\u8a5e/\u4e00\u822c/\u30fc \u306a\u3093\u304b+\u52a9\u8a5e/\u526f\u52a9\u8a5e \u30fc+\u540d\u8a5e/\u4e00\u822c/\u30fc <sp> \u306e+\u52a9\u8a5e/\u683c\u52a9\u8a5e \u3054+\u63a5\u982d\u8f9e/\u540d\u8a5e\u63a5\u7d9a \u8aac\u660e+\u540d\u8a5e/\u30b5\u5909\u63a5\u7d9a \u304c+\u52a9\u8a5e/\u4e00\u822c/\u683c\u52a9\u8a5e \u3067\u304d\u308c+\u52d5\u8a5e/\u81ea\u7acb/\u4eee\u5b9a\u5f62/\u4e00\u6bb5 \u3070+\u52a9\u8a5e/\u63a5\u7d9a\u52a9\u8a5e \u306a\u3042+\u52a9\u8a5e/\u7d42\u52a9\u8a5e \u3068+\u52a9\u8a5e/\u5f15\u7528/\u683c\u52a9\u8a5e \u601d\u3063+\u52d5\u8a5e/\u81ea\u7acb/\u9023\u7528\u30bf\u63a5\u7d9a/\u4e94\u6bb5\u30fb\u30ef\u884c\u4fc3\u97f3\u4fbf \u3066+\u52a9\u8a5e/\u63a5\u7d9a\u52a9\u8a5e \u3044+\u52d5\u8a5e/\u975e\u81ea\u7acb/\u9023\u7528\u5f62/\u4e00\u6bb5 \u307e\u3057+\u52a9\u52d5\u8a5e/\u9023\u7528\u5f62/\u7279\u6b8a\u30fb\u30de\u30b9 \u3066+\u52a9\u8a5e/\u63a5\u7d9a\u52a9\u8a5e",
    "speaker": "A01M6710"
  },
  {
    "id": "A01M6710_0615368_0621203_sp1.1",
    "recording_id": "A01M6710_sp1.1",
    "start": 0.0,
    "duration": 5.3045625,
    "channel": 0,
    "text": " \u305f\u3057\u304b\u306b+\u526f\u8a5e/\u4e00\u822c \u3042\u306e+\u9023\u4f53\u8a5e \u9732\u51fa+\u540d\u8a5e/\u30b5\u5909\u63a5\u7d9a \u3092+\u52a9\u8a5e/\u4e00\u822c/\u683c\u52a9\u8a5e \u30fc+\u540d\u8a5e/\u30fc/\u56fa\u6709\u540d\u8a5e <sp> \u3042\u3063+\u611f\u52d5\u8a5e \u3054\u3081\u3093\u306a\u3055\u3044+\u611f\u52d5\u8a5e <sp> \u3042\u306e\u30fc+\u611f\u52d5\u8a5e \u3044\u308d\u3093\u306a+\u9023\u4f53\u8a5e \u3068\u3053\u308d+\u540d\u8a5e/\u975e\u81ea\u7acb/\u526f\u8a5e \u306b+\u52a9\u8a5e/\u4e00\u822c/\u683c\u52a9\u8a5e \u51fa\u3059+\u52d5\u8a5e/\u81ea\u7acb/\u57fa\u672c\u5f62/\u4e94\u6bb5\u30fb\u30b5\u884c \u3063\u3066+\u52a9\u8a5e/\u9023\u4f53\u5f62/\u683c\u52a9\u8a5e \u306e+\u540d\u8a5e/\u975e\u81ea\u7acb/\u4e00\u822c \u306f+\u52a9\u8a5e/\u4fc2\u52a9\u8a5e \u78ba\u7387+\u540d\u8a5e/\u4e00\u822c \u306f+\u52a9\u8a5e/\u4fc2\u52a9\u8a5e \u4e0a\u304c\u308b+\u52d5\u8a5e/\u81ea\u7acb/\u57fa\u672c\u5f62/\u4e94\u6bb5\u30fb\u30e9\u884c \u3093+\u540d\u8a5e/\u975e\u81ea\u7acb/\u4e00\u822c \u3067\u3059+\u52a9\u52d5\u8a5e/\u57fa\u672c\u5f62/\u7279\u6b8a\u30fb\u30c7\u30b9 \u3088+\u52a9\u8a5e/\u7d42\u52a9\u8a5e",
    "speaker": "A01M6710"
  }
],
"features": {
  "type": "fbank",
  "num_frames": 136848,
  "num_features": 40,
  "frame_shift": 0.01,
  "sampling_rate": 16000,
  "start": 0.0,
  "duration": 1368.4845625,
  "storage_type": "lilcom_hdf5",
  "storage_path": "exp/data/fbank/train_all/feats-39.h5",
  "storage_key": "6bc2e318-a256-4583-af13-3e83ac214858",
  "channels": 0
},
"recording": {
  "id": "A01M6710_sp1.1",
  "sources": [
    {
      "type": "file",
      "channels": [
        0
      ],
      "source": "/home/sysadmin/CSJ_RAW/WAV/core/A01M6710.wav"
    }
  ],
  "sampling_rate": 16000,
  "num_samples": 21895753,
  "duration": 1368.4845625,
  "transforms": [
    {
      "name": "Speed",
      "kwargs": {
        "factor": 1.1
      }
    }
  ]
},
"type": "Cut"

},

manu51188 avatar Feb 25 '21 07:02 manu51188

OK but this time it is different. You have a 5.3s cut with two supervisions - one of them spans the whole cut, the other one is much longer - begins 9.5 seconds before and ends about 4 seconds after. So they are overlapping and it seems that they are coming from the same speaker. Are you sure that this is expected and the issue is not in the creation of the SupervisionSet?

BTW I am not sure how well the current snowfall recipes will handle overlapped speech. In principle the training should not crash, but I don't think the model will learn something meaningful.

If you are completely sure that your data is correct, then you can mitigate the current problem by doing sth like:

for cut in cuts:
    cut.supervisions = cut.trimmed_supervisions

You can read what it does here: https://github.com/lhotse-speech/lhotse/blob/master/lhotse/cut.py#L52. You must do this before any mixing or padding though.

pzelasko avatar Feb 25 '21 19:02 pzelasko

Hi,

Yes it is from the same speaker. The segments look fine and the issue is not in the creation of the SupervisionSet.

Let me try with above method to check.

Regards, Mohit

manu51188 avatar Feb 26 '21 06:02 manu51188

Hi @pzelasko,

I am finally started with the training and it is running fine till now.

The changes in validation function are looking good.

manu51188 avatar Feb 26 '21 09:02 manu51188