CodeMixed-Text-Generator
CodeMixed-Text-Generator copied to clipboard
Output of GCM is empty
Hello,
I am able to run through the aligner and the pregcm stages of the toolkit but when it comes to gcm stage, the output is empty.
Below is a screenshot of the config file.

Also, It seems that the problem is connected to def run_in_try(func, pipe, params): try: #print(params) ret = func(params) except Exception as e: ret = "fail" pipe.send(ret) pipe.close()
It seems to be returning fail consistently. Is there something that I am missing?
A file "out-cm-en-de.txt" is created, but nothing is in it.
As I dig deeper, It seems that some please is generating a nonType

Hello, is anyone able to help me with this?
Hey Cesa,
The GCM isn't perfect and sometimes gives empty output if either the quality of alignments or parse trees isn't good.
The quality of parse trees should be something that you can check based on which parser you're using (stanford or benepar) and how well it supports the languages you're generating the parse trees for.
The main thing that you might want to check is the quality of the alignments. The fast_align aligner that we're using is a statistical aligner, which means is if you do not have data of decent length (>10k parallel sentences) then the quality of alignments learned from such a data wouldn't be of much help to the GCM.
Hope this helps, Sanad
Also in language_1 and language_2 fields, you have to put complete name of the language as shown in the comments in the config file.