CodeMixed-Text-Generator icon indicating copy to clipboard operation
CodeMixed-Text-Generator copied to clipboard

Output of GCM is empty

Open cesaSalaam opened this issue 4 years ago • 4 comments

Hello,

I am able to run through the aligner and the pregcm stages of the toolkit but when it comes to gcm stage, the output is empty.

Below is a screenshot of the config file. Screen Shot 2021-08-17 at 1 48 02 PM

Also, It seems that the problem is connected to def run_in_try(func, pipe, params): try: #print(params) ret = func(params) except Exception as e: ret = "fail" pipe.send(ret) pipe.close()

It seems to be returning fail consistently. Is there something that I am missing?

A file "out-cm-en-de.txt" is created, but nothing is in it.

cesaSalaam avatar Aug 17 '21 17:08 cesaSalaam

As I dig deeper, It seems that some please is generating a nonType Screen Shot 2021-08-17 at 2 28 58 PM

cesaSalaam avatar Aug 17 '21 18:08 cesaSalaam

Hello, is anyone able to help me with this?

cesaSalaam avatar Aug 23 '21 17:08 cesaSalaam

Hey Cesa,

The GCM isn't perfect and sometimes gives empty output if either the quality of alignments or parse trees isn't good.

The quality of parse trees should be something that you can check based on which parser you're using (stanford or benepar) and how well it supports the languages you're generating the parse trees for.

The main thing that you might want to check is the quality of the alignments. The fast_align aligner that we're using is a statistical aligner, which means is if you do not have data of decent length (>10k parallel sentences) then the quality of alignments learned from such a data wouldn't be of much help to the GCM.

Hope this helps, Sanad

mohdsanadzakirizvi avatar Aug 25 '21 07:08 mohdsanadzakirizvi

Also in language_1 and language_2 fields, you have to put complete name of the language as shown in the comments in the config file.

mohdsanadzakirizvi avatar Aug 25 '21 07:08 mohdsanadzakirizvi