link-grammar icon indicating copy to clipboard operation
link-grammar copied to clipboard

Cost adjust

Open ampli opened this issue 2 years ago • 7 comments

(EDIT: Since I have already written it in the wrong box, I leave it here...)

It seems there are several problems in this code. Also, there are several other problems related to it.

Regarding condesc_t::cncost - this is an improper place to put a multi-connector cost: There is a single condesc_t element for each connector string (i.e. all multi and non-multi connectors with the same connector string share a single condesc_t element). However, each multi-connector in the dict may have its own cost, and these costs may be different.

There is also a problem to store the cost in the connector due to the 64 bytes limitation. A multi-connector cost table could be added to disjuncts, but there is a space constraint there too. So the only good place for multi-connector costs seems to be in Sentence_s - a hash table for that can be added.

Among the problems that are related:

  1. It seems there is a need to adjust not only the linkage cost but also the disjunct costs, so reading disjunct costs by the API would lead to consistent results. However, it seems it is not a good idea to update the disjunct costs in their Disjunct_struct because this will create a cost mess if relinking is needed (e.g. for implementing phantoms). To that end, an adjust_cost field could be added to Disjunct_struct, but again there is a space problem there.
  2. The disjunct string (!as shown by !disjunct) should be updated, to be consistent with the adjusted disjunct costs. However, it is not clear how to textually represent multi-matches of multi-connectors. (I also discovered a bug there regarding multi-connectors that has no implications in the current dict/corpora and in your optional multi-connector examples).

Regarding the code, it seems it has several things which are wrong (or that I don't understand). In any case, the code can be simplified and this will save fixing these things. I will add my comments to the code itself.

ampli avatar Nov 04 '22 22:11 ampli

IT seems I answered in the wrong box and pressed the wrong button! Hopefully, no damage has resulted...

ampli avatar Nov 04 '22 22:11 ampli

I propose a simpler algo:

  1. For each word (like now).
  2. For each multi-connector of a chosen disjunct (like now, but also left if needed).
  3. Scan the link array and count the number N of rc (or lc, correspondingly to right/left at (2)) with the same address as found in (2). An address comparison can be used because connectors are copied by reference. Alternatively, a tracon_id comparison can be done.
  4. Adjust by (N-1) * the-multi-connector-cost.

The multi-connector costs can be taken from a hash table stored in Sentence_s. I have no good idea where to store the adjusted disjunct costs (for API fetch if needed and for !disjunct display) - maybe in an array in Linkage_s (with num_words elements).

ampli avatar Nov 04 '22 22:11 ampli

When you say

The multi-connector costs can be taken from a hash table stored in Sentence_s.

Does this table currently exist, or would it have to be created?

linas avatar Nov 07 '22 00:11 linas

Does this table currently exist, or would it have to be created?

It doesn't exist. You can use string_id() on the connector name (with or without the @) to get a hash table index, but you need to resize this table on the fly as needed.

(BTW, I already used string_id() in a similar way in several places. Using a generic hash-table function would be better - added to my to-do list...)

ampli avatar Nov 07 '22 05:11 ampli

Does this table currently exist, or would it have to be created?

It doesn't exist.

OK, thanks. I'm going to keep this open, as a to-do item. Finishing it would not be hard - an afternoon of creating the table and hooking it all up. But nothing depends on it right now, so I don't feel like cluttering up the current code with this new stuff.

linas avatar Nov 07 '22 19:11 linas

FYI, this did eventually become urgent, and an alternate work-around was done in commit 1dcdd159f40d39f149cf9f7650387c284cf6945f which is adequate for now.

linas avatar Feb 21 '23 03:02 linas

In any case I added this to my TODO list.

ampli avatar Feb 22 '23 08:02 ampli