link-grammar icon indicating copy to clipboard operation
link-grammar copied to clipboard

Sharing parse choice elements

Open linas opened this issue 2 years ago • 8 comments

I have an idea of how to memory-spare parse choice elements, just like parse set elements are shared through the x-table. Currently, they cannot be shared because each has a unique value. I thought of these data structures (typedefs omitted):

struct Parse_set_struct
{
   Parse_choice_desc *desc;
   ...
};
struct Parse_choice_desc_struct
{
   Parse_choice_desc *next;
   Parse_choice_lr *l_pc, *r_pc;
};
struct Parse_choice_lr
{	
   Parse_set * set;
   Disjunct    *md;           /* the chosen disjunct for the middle word */
   int32_t id;                /* the tracon (l_id for l_pc and r_id for r_pc  */
};

(What is now set[1] and set[2] will be in l_pc and r_pc, correspondingly.) Of course, there is a question of how much sharing will be done (if any, I don't know even that) and what will be the CPU overhead.

Originally posted by @ampli in https://github.com/opencog/link-grammar/discussions/1402#discussioncomment-4877571

linas avatar Feb 07 '23 01:02 linas

I was going to write something clever here, but I now realize I don't understand the idea. It would seem like this results in two copies of Disjunct *md; instead of just one. There's also an extra int32_t id; that uses more space. I don't understand what Connector *le, *re; are. I don't understand which of these structs you think is sharable...

linas avatar Feb 07 '23 02:02 linas

Sharing this one:

struct Parse_choice_lr
{	
   Parse_set * set;
   Disjunct    *md;           /* the chosen disjunct for the middle word */
   int32_t id;                /* the tracon (l_id for l_pc and r_id for r_pc  */
};

two copies of Disjunct *md; instead of just one

You are right. It should be moved to Parse_choice_desc_struct.

I don't understand what Connector *le, *re; are.

Where? They are not used here.

Also note that id replaces what is now Parse_choice::lc and Parse_choice::rc (depending if this is l_pc or r_pc). A connector pointer can be used instead, but it seems that using tracon_id will lead to more sharing (to be checked).

I didn't explain the basis of this idea: Currently Parse_choise has information of both sides at once., This make all of its elements unique so they cannot be shared. The idea is to separate it to LHS and RHS like set[0] and set[1].

ampli avatar Feb 07 '23 02:02 ampli

id replaces what is now Parse_choice::lc

Ah! OK, that makes sense now.

The Connector *le, *re; are in the Parse_set. I'm guessing wildly that they are connectors on Parse_set_struct::lw and rw? If so, can they be removed and replaced by get_tracon(Parse_set_struct::l_id) ?

linas avatar Feb 07 '23 03:02 linas

The Connector *le, *re; are in the Parse_set. I'm guessing wildly that they are connectors on Parse_set_struct::lw and rw? If so, can they be removed and replaced by get_tracon(Parse_set_struct::l_id) ?

Yes. But the "encoding for parsing" step would then need to produce a tracon table (to get the tracon address by its number). Such a table is produced by the "encoding for pruning" step, which shares most of its code, so it is easy to produce it. However, we will need to find how much overhead, if any, this adds.

ampli avatar Feb 07 '23 03:02 ampli

would then need to ...

OK. I guess we can leave this alone for now. Things are complicated enough, as it is.

linas avatar Feb 07 '23 04:02 linas

I guess we can leave this alone for now. Things are complicated enough, as it is.

But the opposite thing, i.e. remove int32_t l_id, r_id; seems easy and without overhead (unless I missed something) so I will try that.

ampli avatar Feb 07 '23 04:02 ampli

The Connector *le, *re; are in the Parse_set. I'm guessing wildly that they are connectors on Parse_set_struct::lw and rw? If so, can they be removed and replaced by get_tracon(Parse_set_struct::l_id) ?

I found a way to replace Connector *lc, *rc; in Parse_choice with int32_t l_id, r_id; (saving 8 bytes) and get rid of int32_t l_id, r_id; in Parse_set w/o a noticeable overhead (PR soon).

BTW, by producing a tracon/disjunct mapping table (while encoding for parsing), it seems possible to also get rid of Disjunct *md; in Parse_choice. Even w/o additional table mapping, with some small overhead, it is possible to get rid of uint8_t lw, rw; in Parse_set, but then we still have uint8_t null_count that prevents the saving of 8 bytes. However, it is possible to get rid even of it if an x_table per null_count is implemented (which can be a good idea anyway).

ampli avatar Feb 12 '23 04:02 ampli

it seems possible to also get rid of Disjunct *md

I Could not do it after all.

ampli avatar Feb 19 '23 04:02 ampli