cg3 Allow printing tags without full tracing

trafficstars

Since commit a9c767573edaefb0d59c2c9d93fbf1048d5c92a3, tags are not printed unless tracing is enabled. However, since CG is now used in many Apertium pairs during generation to handle preferences, tags may be necessary without full tracing.

Tags are specially useful when running a testvoc. If no tags are printed, only the internal lemma with # is shown, which is difficult to debug. Enabling tracing with -t helps in this sense, but also adds excessive information that the postgenerator does not handle properly.

I suggest adding a new flag to print tags, regardless of tracing, or printing tags by default again (unless it is really preferable not to print tags by default).

Thanks!

Feb 02 '22 22:02 marcriera

Ping @unhammer

Feb 03 '22 06:02 TinoDidriksen

So you're running a pipeline that uses cg-proc -n -g after lt-proc -g and the lt-proc steps which used to give you #foo<tag> now gives #foo?

Feb 03 '22 13:02 unhammer

if I understand correctly the problem is

$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin
^NotInGen<np>/@NotInGen<np>$
$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin | cg-proc -g -n nob-nno.genprefs.rlx.bin
#NotInGen

doesn't give the tags, while if we use -t it does give tags

$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin | cg-proc -g -n -t nob-nno.genprefs.rlx.bin
#NotInGen\<np\>

but is noisy if a rule actually hit:

$ echo å gafle|apertium -f none -d . nob-nno-dgen |cg-proc -g -n -t nob-nno.genprefs.rlx.bin
å gafla/¬gafle\<v:infa_infe\><REMOVE:26>

Feb 03 '22 13:02 unhammer

or printing tags by default again (unless it is really preferable not to print tags by default)

This is running after the generator, so we do have to get rid of the tags to avoid them ending up in the output shown to the user.

Also, I suppose you only want tags on the stuff we couldn't generate?

Feb 03 '22 13:02 unhammer

if I understand correctly the problem is

$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin
^NotInGen<np>/@NotInGen<np>$
$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin | cg-proc -g -n nob-nno.genprefs.rlx.bin
#NotInGen

doesn't give the tags, while if we use -t it does give tags

$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin | cg-proc -g -n -t nob-nno.genprefs.rlx.bin
#NotInGen\<np\>

but is noisy if a rule actually hit:

$ echo å gafle|apertium -f none -d . nob-nno-dgen |cg-proc -g -n -t nob-nno.genprefs.rlx.bin
å gafla/¬gafle\<v:infa_infe\><REMOVE:26>

It's exactly this, thanks.

or printing tags by default again (unless it is really preferable not to print tags by default)

This is running after the generator, so we do have to get rid of the tags to avoid them ending up in the output shown to the user.

Also, I suppose you only want tags on the stuff we couldn't generate?

Yes, tags should only appear for lexical units that cannot be generated. The generator is running right before this and trying to generate a surface form, so the input to cg-proc -g -n will only contain tags in cohorts if there's a generation error in the previous step of the pipeline.

Feb 03 '22 16:02 marcriera

The generator is running right before this and trying to generate a surface form, so the input to cg-proc -g -n will only contain tags in cohorts if there's a generation error in the previous step of the pipeline.

Well, there will also be tags on readings if there are variant tags (in addition to the input tags which are there since we use lt-proc -b on the generator):

$ echo blå | apertium -d . nob-nno-dgen
^blå<adj><sint><pst><un><pl><ind>/blå/blåe<v:blå_blåe>$

(that's the input to cg-proc -g -n)

Feb 03 '22 19:02 unhammer

The generator is running right before this and trying to generate a surface form, so the input to cg-proc -g -n will only contain tags in cohorts if there's a generation error in the previous step of the pipeline.

Well, there will also be tags on readings if there are variant tags (in addition to the input tags which are there since we use lt-proc -b on the generator):
$ echo blå | apertium -d . nob-nno-dgen
^blå<adj><sint><pst><un><pl><ind>/blå/blåe<v:blå_blåe>$
(that's the input to cg-proc -g -n)

You're right, of course. For some reason I had assumed these were just removed, but they are tags after all.

I suppose we could distinguish between invalid and valid readings by checking if there's a # or @ in the input. These are added by the generator only if it cannot generate anything. I assume they are also escaped if the generation is valid (there could be a lexical unit beginning with these two characters).

Feb 04 '22 16:02 marcriera

cg3 cg3 copied to clipboard

Allow printing tags without full tracing

cg3
cg3 copied to clipboard