OpenNARS-for-Applications
OpenNARS-for-Applications copied to clipboard
Allow new allocation of atoms by overriding "previously referenced but now unreferenced" atoms
Narsese_AtomicTermIndex can only support a max # (ex: ~2^16 for the unsigned short) of unique "atoms" in the runtime of a NAR. you can argue that this is unlikely to be exhausted for whatever use case you imagine, but: it is an incomplete imitation of AIKR. actual AIKR implementations do exist, ex: in Java and likewise entirely possible with C/C++.
the use of these interned int's in ONA's "Compound" Term seems limited to 1-level deep.
will include a fast and efficient alternative solution here soon.
basically: a "radix trie bag" which is a radix trie with priority accumulators for each leaf link in a node. these would get added and subtracted when leaves in the radix trie are added or removed. the priority table in a radix trie node enables roulette selection, like sampling a markov chain. then just implement a serializer for any Term, Concept, or Task instance. there may be an existing C/C++ concurrent radix trie implementation to support safe multithreaded access.
I agree! I didn't bother to resolve this yet since, as you said, 65K atoms are a lot so doesn't appear when the input alphabet of atoms is fixed and within that number.
But I want to fix it in the long-term, but with a simple solution if possible and without performance impact. I think the easiest fix is to see if an atom isn't referenced anymore (for instance via ref counter), which allows it to be overridden with a new one. I have an implementation of this in a branch actually but it wasn't totally complete yet. Before allowing merging I would have needed to extend it to also capture atoms currently referenced by temporal implication links, which took me some time and then I was sidetracked.
how were negated subterms supposed to work if compounds can only be 1-level deep? :skull:
probably an immutable Term*[] array of pointers to subterms, like in Java, is inevitable. none of that 128 pre-allocated unsigned-short stuff. just an array of exact size as the number of subterms.
interning terms (not just Atoms) is good for memory and can accelerate equality tests. i use a 'lossy' bag-based best-effort term interning strategy in narchy which offers > 95% hitrate while still being fully AIKR by size limit and can run indefinitely, even as new terms are introduced.
for whatever reason this project avoids use of C++, there are still ways to develop OOP-like API's with C structs. this is where methods for Term classes would be helpful in accessing Compound subterms.
Compound terms only 1 level deep? No idea where you got that from. Regarding the other: not it's not inevitable, I have an alternative solution in my branch. Regarding C++: don't get me started on that, please stay with your Java impl if you have an issue with language choice.
typedef struct
{
bool hashed;
HASH_TYPE hash;
Atom atoms[COMPOUND_TERM_SIZE_MAX];
}Term;
So if this is supposed to be a Compound, and atoms[] are the subterms, how can a Compound have a Compound subterm? I double checked to see if atoms actually meant atoms here and yeah it is that unsigned short repr.
how can a Compound have a Compound subterm?
By having the tree structure of the compound being encoded in the array. This is why term hashing, comparison and unification is 100% cache friendly and an order of magnitude faster in that regard than an implementation which relies on pointer/reference indirections to handle subterms.
Of course no example would work without ability to have nested compounds. Even the most simple behavior representation, <(a &/ ^opname) =/> b>
is already a nested compound of a sequence within a temporal implication.
Its array representation is [=/> &/ b a ^opname]
i knew there must be some encoding that wasn't immediately obvious from the code. maybe add some comments to the Term structure explaining this 'atoms' array includes (non-atom) operators, and maybe rename the field to 'subs'.
as for efficiency comparison, i don't know if you have actually measured it but given proper deduplication i think there's a chance for a pointer representation to use less memory than these 256-byte+ compounds. cache line size is typically 64 bytes, so that's already 4 cache lines in an attempt to remain cache efficient. instead the equivalent pointer graph representation could use less memory and thus less cache lines, and though the access order is not as sequential, it may cache miss less... a real performance comparison of these two representations would be interesting to see and it would be nice if ONA somehow supported both.