CatE icon indicating copy to clipboard operation
CatE copied to clipboard

Malloc error

Open ntruongv opened this issue 4 years ago • 3 comments

Hello,

I tried to run CatE on ~500,000 word corpus with 300D glove embedding and kept receiving this error:

  1. On MAC: cate(16090,0x700002f98000) malloc: Incorrect checksum for freed object 0x7fc442608c98: probably modified after being freed.

  2. On Linux: Error in `./src/cate': free(): invalid next size (fast): 0x000055d279a778f0

Do you have any suggestion?

Thanks for the awesome work by the way. I really enjoy your paper.

ntruongv avatar Sep 21 '20 23:09 ntruongv

Hi,

Thanks for your interest in our paper and code! Could you please provide the following information to help me figure out the issue?

  1. Were you able to successfully run the code on the example dataset (nyt & yelp)?
  2. When using the 300D GloVe embedding, did you use that as the pre-trained embedding to load from?
  3. Did the error happen before the training started, or during training?

Thanks, Yu

yumeng5 avatar Sep 22 '20 02:09 yumeng5

Hello,

Thanks for the quick reply.

  1. I ran the code successfully on the example dataset with word2vec_100.txt. It also ran successfully on my own dataset with word2vec_100.txt. I haven't tried the 300D GloVe on the example dataset yet. Strangely, if I try maybe >10 times, there would be one time where it may run successfully with 300D embeddings on my dataset.
  2. I use -load-emb glove_emb_file.txt -size 300 to load GloVe as pretrained embedding.
  3. The error happens most often during the load embedding stage. There are also occasions where it happens during the pretraining epochs (<15% progress)

Thanks for helping out.

ntruongv avatar Sep 22 '20 04:09 ntruongv

Hi,

Thanks for the info. I tried downloading the GloVe pre-trained embedding and adding the vocabulary size & embedding dimension to the first line (I assume you did this too) of the embedding file. Then I also encountered a segfault though it was different from the memory error you mentioned. I feel that I might need some more time to figure out what happened since I'm quite busy recently. In the meantime, maybe you can try to run the code without loading the pre-trained embeddings (they are not required, and I found that the code works fine without loading pre-trained embeddings). Alternatively, maybe you can try to load another type of pre-trained embedding (such as JoSE which also has 300D pre-trained embeddings)? Different types of pre-trained embeddings won't cause too much difference in the final results of CatE as long as they provide some good initializations.

I'll let you know once I figure out the issue.

Best, Yu

yumeng5 avatar Sep 23 '20 06:09 yumeng5