Tcl-bounties Requirements for [array foreach]

Requirements for [array foreach]

Open dkfellows opened this issue 7 years ago • 24 comments

There are a number of things that are required to satisfy the array foreach bounty. Because there's interest from multiple people over this, I'll lay down what we really need.

[ ] Implementation of a subcommand, foreach, of the array ensemble. (Preferably to be part of Tcl 8.7.)
- Implementation should be in C (as a pure Tcl implementation is unlikely to have the desired memory-use characteristics).
- Implementation may be simple, using Tcl_EvalObjEx, etc.
- Bonus for NRE-aware, double-bonus for bytecode version — only needs to be bytecoded inside procedure-like contexts — but neither of those are required to satisfy the bounty. They can be provided (e.g., by Tcl maintainers) after the bounty is paid out.
[ ] Documentation is required. Can be a plain text paragraph, but should be in the writing style of the subcommands on the array(n) manpage.
[ ] Tests (using tcltest) are required. Should test simple usage, failure modes (such as wrong args, unwriteable iteration variable(s), etc.) and “interesting” edge cases like semantics when the array is written to during the iteration loop.
[ ] TIP required, but trivial. Main things: why is this a good idea (independent of “there's a bounty on it”), and a summary of what are you are doing that will be useful to other Tclers to understand what to expect. (The documentation paragraph is likely to be useful here!)

It's not enough to just have an implementation. It must also be tested (so that we can safely develop improved implementations as required) and it must be documented (so other Tclers can learn about it). The TIP is conceptually part of the documentation, used as a form of milestone in our release notes.

Apr 16 '17 12:04 dkfellows

Cross reference with these other issues: https://github.com/flightaware/Tcl-bounties/issues/10 https://github.com/flightaware/Tcl-bounties/issues/12 https://github.com/flightaware/Tcl-bounties/issues/15 https://github.com/flightaware/Tcl-bounties/issues/18

Apr 16 '17 13:04 dkfellows

Cross reference with this existing TIP: TIP #421

Apr 16 '17 13:04 dkfellows

It was the intention of myself and Andy Goth to merge his work and mine to ensure that there would be a single C API. I think this is the best path forward, as his new API will handle the traces and error conditions properly. I would like to benchmark an 'array for' command written using his API and compare it against what I wrote.

The existing TIP has not been updated. Arguments about array foreach vs. array for &etc. have not been started.

For the array for code I wrote:

I attempted to do the bytecode, but was not able to complete it. This effort can be junked.
My implementation is NRE aware.
No documentation has been written.
I implemented a array for {k v} arrayname {}, which is probably too much (and makes the bytecode complicated).
Only array for {k} arrayname {} is needed.
Most of the tests are done. http://core.tcl.tk/tcl/info/bd05353216cc56ff

May 01 '17 19:05 bll123

array for {k v} arrayname { ... } would be an O(N) operation, whereas array for {k} arrayname { set v $arrayname{$k}; ... } would be O(NlogN).

May 01 '17 20:05 resuna

Yes, it should save a little time, but I don't think the end result is much different from the Tcl code, excepting the overhead of the set call (I could not determine any simpler way to get the value object).

I will try to find some time to reimplement my code using Andy Goth's API and see how that works out.

The C code does:

  if (valueVarObj != NULL) {
       valueObj = Tcl_ObjGetVar2(interp, arrayNameObj, keyObj,
             TCL_LEAVE_ERR_MSG);

      if (Tcl_ObjSetVar2(interp, valueVarObj, NULL, valueObj,
                TCL_LEAVE_ERR_MSG) == NULL) {
           result = TCL_ERROR;
           goto done;
       }
   }

May 01 '17 20:05 bll123

OK, so your implementation is basically a "C" version of the stock foreach key [array names foo], and is not walking the internal array structures? I don't think that will satisfy the bounty requirements, because all it's saving is a copy of the names of the array elements into a list, which is an O(N) operation. The O(NlogN) term is still in there, so it's not fundamentally more efficient than the naive Tcl implementation.

May 02 '17 13:05 resuna

As I understand it, the problem is that traces can do all sorts of tricky things that cause trouble. I guess that a more subtle approach would be to use some sort of modification sequence number in the array (changing the structure of arrays) so that iterations could perform a simple comparison to detect the world changing under their feet. We use similar techniques elsewhere (e.g., with bytecode where renaming a command with a compiler increments the compilation epoch counter).

May 02 '17 13:05 dkfellows

No, that is not correct. No, it is not a stock version of foreach key .....

Even after bytecoding, I don't think you are going to get any huge speed gains, but the memory savings are significant.

May 02 '17 13:05 bll123

I didn't mean to imply that it was just a stock version of foreach key [array names foo] { ... }, what I'm getting at is that the time efficiency of the operation is the same as the stock foreach key [array names foo] { ... }. That matches the timing results in the other thread where it's more memory efficient but slower than foreach {key val} [array get foo] { ... }.

Since the loop is running in "C" code it doesn't matter whether the operation itself is bytecoded, the operation lookup is a tiny fraction of the total time.

May 02 '17 13:05 resuna

Right. I think once bytecoded, there will some time savings, but nothing significant. The fetch of the value object from the hash table (which has all the trace checking, etc.) is never going to be any faster.

May 02 '17 13:05 bll123

OK, I don't understand what you're doing then, if you're walking the hash table to get the keys but you don't have access to the values in the same operation.

What we're looking for is an analog to the Speed Tables "$table search" operation, which is O(N) for a straight linear search because it walks the index and has access to the row in the table without an extra lookup on the key.

May 02 '17 13:05 resuna

Hmm, it's been pointed out that the time for Tcl_GetVar2 shouldn't be O(logN) unless there's a serious problem with Tcl hash tables, but there's some kind of superlinear term because I can't see otherwise how the time for your array for could be higher than the time for foreach {key val} [array get foo] { ... }.

May 02 '17 13:05 resuna

Does [array get foo] bypass the trace checking?

May 02 '17 13:05 resuna

Well, as DKF hinted, We can:

change arrays to have a global epoch counter each time the array has an insert/delete operation.
not execute any traces on the array.

Then it could be made much faster, as the low-level hash table operations and fetch of value objects could be used.

The problem is, many people will expect an array for to fire the read traces. Others won't care.

May 02 '17 13:05 bll123

The expectation from the bounty request was that array for/foreach would be significantly more efficient than stepping through the array manually in Tcl. It should be significantly faster. Slightly slower than the array get loop (or even the same time, or only slightly faster) does not meet those expectations and I'm really concerned about that.

Does array get fire the read traces?

May 02 '17 13:05 resuna

If there are no read traces set for the array, then it should be able to tell that by checking once at the start of the loop, no? It shouldn't be duplicating that check on every row.

May 02 '17 13:05 resuna

It appears that array get does not fire any read traces. I will take a look and see what it is doing.

May 02 '17 13:05 bll123

That's a very surprising discovery, and something I will have to keep in mind when debugging Tcl code in the future because I would have assumed array get would act the same as any other read operation on an array.

May 02 '17 13:05 resuna

No, I was incorrect -- it fires the trace on the individual element. A read trace on the entire array does not fire (don't know if that ever does).

May 02 '17 13:05 bll123

array get is also using Tcl_GetVar2.

dict for has a speed of 5447.601 microseconds in my same benchmark. Can you simply convert to using dictionaries? You could also keep an array and a dict in sync in order to get the best of both, though that has memory overhead.

May 02 '17 14:05 bll123

I'll get with karl on this, it's a very surprising result. He may still be interested in the code clean-up advantages of being able to step through the array without copying the keys.

May 02 '17 14:05 resuna

Maybe we need a bounty on speeding up array access in general as a follow-up.

May 02 '17 14:05 resuna

Let's revisit this one after Andy gets back to https://github.com/flightaware/Tcl-bounties/issues/18 ?

May 11 '17 14:05 resuna

Here is a very simple implementation of array foreach written in pure Tcl:

    proc arrayforeach {keyvarName arrayName body} {
	upvar 1 $keyvarName key $arrayName array
	set s [array startsearch $array]
	try {
	    while {[array anymore array $s]} {
		set key [array nextelement array $s]
		uplevel 1 $body
	    }
	} finally {
	    array donesearch array $s
	}
    }
    # Inject foreach into [array] ensemble
    namespace ensemble configure ::array -map \
	[dict replace [namespace ensemble configure ::array -map] \
	     foreach [list [namespace which arrayforeach]]]

I don't think it is quite enough (it only digs out the keys), but it shows that people can test out ideas without having to commit to writing C straight off the bat.

Oct 24 '17 11:10 dkfellows

Tcl-bounties Tcl-bounties copied to clipboard

Requirements for [array foreach]

Tcl-bounties
Tcl-bounties copied to clipboard