vblang icon indicating copy to clipboard operation
vblang copied to clipboard

[Proposal] For Each ByRef

Open VBAndCs opened this issue 5 years ago • 20 comments

I suggest to allow iterating through collections by ref. like this:

Dim A = {1, 2, 3}
For Each ByRef item in A
     item += 1
Next

after executing, A = {2, 3, 4}

VBAndCs avatar Aug 25 '20 16:08 VBAndCs

What problem does this solve?

pricerc avatar Aug 26 '20 23:08 pricerc

@pricerc

Obviously, the given example is way too trivial to illustrate the need, and can already be done efficiently in today's VB, like so:

Dim A = {1, 2, 3}
For itemIndex = 0 To A.Length - 1
    A(itemIndex) += 1
Next

But, in a more demanding situation, the ability to maintain a reference to a Structure (a.k.a. .Net "value") type provides two main benefits: (1) A means to mutate the actual instance (rather than its copy, which is the usual idiom for .Net value types); and (2) a means to reduce the copying of value types, which helps reduce memory pressure for large sized (over 16 bytes) value types.

C# has long had something called "ref locals" for that sort of thing. Meanwhile, in VB land, we are denied the equivalent of a C# ref local (I suggested Dim ByRef - https://github.com/dotnet/vblang/issues/34).

The use of ref locals happens often enough when iterating through arrays of struct, so an extended foreach was introduced in C# 7.3 (https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/ref#ref-locals).

Anyway, in today's VB, the closest emulation possible is by way of the With statement, or ByRef parameters in helper functions or lambdas.

E.g. closest emulation possible of @VBAndCs example:

<Extension> Function Indexes(Of T)(a As T()) As IEnumerable(Of T)
    Return System.Linq.Enumerable.Range(0, a.Length)
End Function

Dim A = {1, 2, 3}

For Each itemIndex In A.Indexes()
    Call Sub(ByRef item As Integer)
             item += 1
             ' And whatever other stuff to do ...
         End Sub(A(itemIndex))
Next

rskar-git avatar Aug 29 '20 16:08 rskar-git

For what it's worth, this is possible in VB today:

Dim A = {1, 2, 3}

With A.AsSpan().GetEnumerator()
    Do While .MoveNext()
        Call Sub(ByRef item As Integer)
                 item += 1
             End Sub(.Current)
    Loop
End With

It does not confer any speed advantage for integer arrays (it's actually a tad slower versus indexing from For i = 0 to A.Length - 1), but maybe in other scenarios it might work better. (In C#, using foreach (ref int x in A.AsSpan()) is only slightly faster.)

Note that one can use For Each on a Span, like so:

For Each item In A.AsSpan()
    WriteLine($"{item}")
Next

However, item will be a copy of each array element; but again, maybe there are some scenarios where it might work nicely. Slice could be fun and useful:

For Each item In A.AsSpan().Slice(1, 1)
    WriteLine($"{item}")
Next

rskar-git avatar Aug 29 '20 20:08 rskar-git

This is possible in VB today in For Each loop:

Public Sub EnumerableWithIndex_String_Test()
    Dim vals = {"A", "B", "C", "D"}
    For Each e In vals.WithIndex
        With e
            Dim out = Sub() ConsoleLine($"{ .Index}, { .Value}")

            If .IsFirst Then
                out() : Assert.Ok(.Index = 0 And .Value = "A")
                .MoveNext()
                out() : Assert.Ok(.Index = 1 And .Value = "B")

            ElseIf Not .IsLast Then
                out() : Assert.Ok(.Index = 2 And .Value = "C")
                .Value = "C1"                                                                  ' CHANGE VALUE
                out() : Assert.Ok(.Index = 2 And .Value = "C1")

            Else
                out() : Assert.Ok(.Index = 3 And .Value = "D")
            End If
        End With
    Next e
End Sub

Public Sub EnumerableWithIndex_Integer_Test()
    Dim vals = {1, 2, 3, 4, 5}
    For Each e In vals.WithIndex
        With e
            Assert.Ok(.Index + 1 = .Value)
            .Value += 1                                                                        ' CHANGE VALUE
            Assert.Ok(.Index + 2 = .Value)
        End With
    Next e
End Sub

<Extension>
Public Iterator Function WithIndex(Of T)(
                                         source As IList(Of T)
                                         ) As IEnumerable(Of Index_(Of T))
    Using enumerator = source.GetEnumerator
        Dim hasNext = enumerator.MoveNext
        Dim index = -1
        While hasNext
            Dim wi = New Index_(Of T)(index:=index, source:=source, enumerator:=enumerator)
            wi.MoveNext()
            Yield wi
            hasNext = Not wi.IsLast
            index = wi.Index                                                                   ' if .MoveNext was used
        End While
    End Using
End Function

Public Class Index_(Of T)

    Private _value As T
    Public Property Value As T
        Get
            Return _value
        End Get
        Set(value1 As T)
            _value = value1
            _source(Index) = _value
        End Set
    End Property

    Public ReadOnly Property Index As Integer                                                  ' first element has index = 0
    Public ReadOnly Property IsLast As Boolean

    Private ReadOnly _enumerator As IEnumerator(Of T)
    Private ReadOnly _source As IList(Of T)

    Public Sub New(
                   index As Integer,
                   source As IList(Of T),
                   enumerator As IEnumerator(Of T)
                   )
        Me.Index = index
        _source = source
        _enumerator = enumerator
    End Sub

    Public Function IsFirst() As Boolean
        Return Index = 0
    End Function

    Public Sub MoveNext(Optional n As Integer = 1)
        For i = 1 To n
            _value = _enumerator.Current
            _IsLast = Not _enumerator.MoveNext()
            Threading.Interlocked.Increment(_Index)                                            ' may be called with .AsParallel
        Next i
    End Sub

End Class

WalterLederer avatar Sep 04 '20 14:09 WalterLederer

So, You wrote about 100 lines to do what I am suggesting to do by adding the 5 letters ByRef keyword :D I am not trying to do things. I am trying to do them in the smartest way, so, we can reclaim our historical strong easy smart VB.

VBAndCs avatar Sep 04 '20 14:09 VBAndCs

So, You wrote about 100 lines to do what I am suggesting to do by adding the 5 letters ByRef keyword :D I am not trying to do things. I am trying to do them in the smartest way, so, we can reclaim our historical strong easy smart VB.

The 100 lines are kept in a library and don't need to be typed in for every usage. If you strip away other features like Index and MoveNext it's not so big. I am sure the theoretical ByRef implementation in the VB compiler will also consume at least 100 lines of code - which you cannot modify easily.

Your suggestion (if new language features will get implemented?):

Dim A = {1, 2, 3}
For Each ByRef item in A
     item += 1
Next

My workaround:

Dim A = {1, 2, 3}
For Each item in A.WithIndex
     item.Value += 1
Next

WalterLederer avatar Sep 04 '20 17:09 WalterLederer

And then you will hit the higher bad performance wall, especially that loops tend to be large. Again: I want vb to be once again the language where users type less, and get more.

VBAndCs avatar Sep 04 '20 18:09 VBAndCs

And then you will hit the higher bad performance wall, especially that loops tend to be large.

“The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.”

quoted from Donald Knuth, "The Art of Computer Programming"

WalterLederer avatar Sep 04 '20 18:09 WalterLederer

@WalterLederer That is a valiant try, but I'm with @VBAndCs on this, the WithIndex/Index_ concept is very memory heavy, especially in regard to (non-trivially sized) value-types. Also, it imposes two subtle risks.

Risk 1: It captures a copy of an element (in _value via IEnumerator.Current), so in scenarios where the actual value-type element is expected to be volatile, this will not work.

Risk 2: It presumes the (zero-based) ordinal of the elements yielded by IEnumerator Reset/MoveNext/Current has the same integer value as the actual index. That is often the case, but not always, and in those nonequivalent moments, assignments/mutations won't work as expected.

In terms of the Knuth quote, this is exactly the right time to review what's being done (or proposed) and why. What's being done is manipulation of (usually large) value-types, and the why of ByRef or "ref locals" is to greatly reduce memory usage (copies and storage), and also as a means to efficiently mutate such value-types. Knuth was warning about being too clever and too cute in one's approach to the problem. ByRef or "ref locals" doesn't represent a case of premature optimization, it is already a common long-time use-case.

rskar-git avatar Sep 04 '20 19:09 rskar-git

concept is very memory heavy

Fixed, see below

Risk 1: It captures a copy of an element

Fixed, see below

Risk 2: It presumes the (zero-based) ordinal of the elements

This is not a problem with the used IList interface

    <Extension>
    Public Iterator Function WithIndex(Of T)(
                                             source As IList(Of T)
                                             ) As IEnumerable(Of IndexStructure_(Of T))
        Dim ic As New IndexClass_                                                                  ' use ref type to get changes to index with .MoveNext back to this loop
        While ic.Index < source.Count - 1
            ic.MoveNext()
            Yield New IndexStructure_(Of T) With {.Ic = ic, .Source = source}                      ' use value type to avoid memory allocation
        End While
    End Function

    Public Class IndexClass_
        Public Property Index As Integer = -1                                                      ' first element has index = 0

        Public Sub MoveNext(Optional n As Integer = 1)
            For i = 1 To n
                Threading.Interlocked.Increment(Index)                                             ' may be called with .AsParallel
            Next i
        End Sub

    End Class

    Public Structure IndexStructure_(Of T)

        Public Property Ic As IndexClass_
        Public Property Source As IList(Of T)

        Public Property Value As T
            Get
                Return Source(Index)
            End Get
            Set(value1 As T)
                Source(Index) = value1
            End Set
        End Property

        Public Function Index() As Integer
            Return Ic.Index
        End Function

        Public Function IsFirst() As Boolean
            Return Index() = 0
        End Function

        Public Function IsLast() As Boolean
            Return Index() >= Source.Count - 1
        End Function

        Public Sub MoveNext(Optional n As Integer = 1)
            Ic.MoveNext(n)
        End Sub

    End Structure

WalterLederer avatar Sep 05 '20 17:09 WalterLederer

[concept is very memory heavy] ... [It captures a copy of an element] ... Fixed, see below

Dropping the Private _value As T part does improve things. However, VB Property isn't by-reference (such as is possible in C#), hence elements will still be returned as copies. Even where Property Value is passed in to a ByRef parameter of a function/lambda call, there will still be a "defensive" copy operation involved; i.e. the apparent ByRef will operate instead as a by-proxy. Hence, there will be at least two copies of the element during the call - a real memory pressure issue for large Structure types.

The most straight-forward coping strategy is for Class Index_(Of T) to have some sort of method that will execute a Delegate that takes a ByRef to the element; such a method can then directly pass in Source(Index) (i.e. it does not use Property Value), which (hopefully) improves the chances of a bona fide by-reference pass. Although, I will admit, it might be that an element from an IList may still result in a pass-by-proxy situation. In which case, one may have to give up on IList and use Array instead.

[It presumes the (zero-based) ordinal...] ... This is not a problem with the used IList interface

Now that the Private ReadOnly _enumerator As IEnumerator(Of T) is dropped, and the index into the IList is directly used, I agree this revision does fix that.

rskar-git avatar Sep 05 '20 18:09 rskar-git

a real memory pressure issue for large Structure types.

From https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/choosing-between-class-and-struct :

"CONSIDER defining a struct instead of a class if instances of the type are small and commonly short-lived or are commonly embedded in other objects.

❌ AVOID defining a struct unless the type has all of the following characteristics:

It logically represents a single value, similar to primitive types (int, double, etc.).

It has an instance size under 16 bytes.

It is immutable.

It will not have to be boxed frequently.

In all other cases, you should define your types as classes."

WalterLederer avatar Sep 05 '20 18:09 WalterLederer

In all other cases, you should define your types as classes.

Absolutely agree! Most of the time. In any case, a For Each scenario upon a collection of Class or Interface types would rarely benefit from a proposal like For Each ByRef - having a reference to an instance (which is what a Class or Interface value is) should be sufficient - and I still scratch my head about what use-cases would absolutely also need the index in that sort of situation, but whatever. Evenso, For Each ByRef is helpful for stepping through arrays; a small performance improvement for iteration and de-referencing, plus another chance to avoid a common error about the upper bound of the indexes. E.g., how often does one see a flub like For i = 0 To ThisArray.Length - you know, forgetting the - 1 after the .Length!

Anyway, the time may come where you must reckon with software that's not of your own creation, and they went with large structs. For whatever the reasons. In real life, such things must be happening in real live projects. Probably image processing or neural nets or Bayesian nets or some such. But it happens, and the C# folks love it!

And there are occasions where large structs + ref locals can be beneficial; Span(Of T) is the gateway here. Food for thought, on the C# side: https://docs.microsoft.com/en-us/archive/msdn-magazine/2018/january/csharp-all-about-span-exploring-a-new-net-mainstay, https://medium.com/@antao.almada/how-to-use-span-t-and-memory-t-c0b126aae652, https://www.stevejgordon.co.uk/an-introduction-to-optimising-code-using-span-t.

Of course, the original concept for .Net was to make nearly exclusive use of managed-code classes, since that provided the best balance of performance, thread safety, and memory management (e.g. garbage collection). Yet even in the earliest days of .Net, there were recognized moments where working mostly on the stack with large structs made some sense (usually performance-wise) - e.g. https://stackoverflow.com/questions/5171781/when-to-use-pointers-in-c-net. Overtime, the desire for something safer than pointers lead to Span and other safer methods of direct access to managed and unmanaged instances of struct.

rskar-git avatar Sep 05 '20 20:09 rskar-git

I still scratch my head about what use-cases would absolutely also need the index

Because I extended my example from #525 and did not remove the Index part

Span(Of T) is the gateway here. Food for thought, on the C# side

Any workarounds to use Span(Of T) also in VB.NET?

WalterLederer avatar Sep 07 '20 12:09 WalterLederer

@WalterLederer

Because I extended my example...

There were folks who indicated interest in #525, but if anyone demonstrated a compelling use-case there, then I missed that. The only reason I can see to want a bona fide index (e.g. into Array or List or IList) is for the mutation of value-type elements. In which case, you and I and many others have already shown that to be quite do-able in VB today. (Not to mention, the general advice to favor Class over Structure anyway...)

Any workarounds to use Span(Of T) also in VB.NET?

Absolutely! Check out the first few posts at https://gitter.im/VB-NET/HowTo (look for "ForEachByRef").

rskar-git avatar Sep 07 '20 12:09 rskar-git

demonstrated a compelling use-case there

Comparing multiple List of strings where the first list should be transformed/formatted to the second one by a function which takes a list of string as parameter. Then checking the result:

            For Each l In expectedLines.WithIndex
                Dim actLine = actualLines(l.Index)
                Dim expLine = expectedLines(l.Index)
                If expLine <> actLine Then
...

WalterLederer avatar Sep 07 '20 13:09 WalterLederer

@WalterLederer The For Each l In expectedLines.WithIndex example is a situation where index is a key field which relates items in actualLines with respective ones in expectedLines, and for something like that using the "old-school" For index = 0 to expectedLines.Count - 1 makes better sense. There's no reason to be using For Each here, except maybe as a means to get the indexes via light-weight helper functions like:

<Extension> Function Indexes(Of T)(a As T()) As IEnumerable(Of Integer)
    Return Enumerable.Range(0, a.Count)
End Function
<Extension> Function Indexes(a As IList) As IEnumerable(Of Integer)
    Return Enumerable.Range(0, a.Count)
End Function

They help avoid the "Count" vs. "Count minus one" bugs, which is useful:

For Each index In expectedLines.Indexes
    Dim actLine = actualLines(index)
    Dim expLine = expectedLines(index)
    If expLine <> actLine Then
' ...

rskar-git avatar Sep 08 '20 20:09 rskar-git

<Extension> Function Indexes(Of T)(a As T()) As IEnumerable(Of Integer)

Ok, thanks, I have changed my code to use .Indexes in this case. But I've found a better example where .Index is needed to update a progress bar control:

                For Each d In links.WithIndex
                    With d
                        UpdateProgressPercent(title:="Check", inx:= .Index, maxInx:= .MaxIndex)
                        If .Value.IsSymbolicLink Then                                                                ' check for broken links
                                Dim target = .Value.SymbolicLinkTarget
...

This can be simplified by passing IndexStructure_(Of T) to UpdateProgressPercent. This avoids completely specifying index and maxindex explicitely.

                For Each d In links.WithIndex("Check")
                    With d
                        UpdateProgressPercent(d)
                        If .Value.IsSymbolicLink Then                                                                ' check for broken links
                                Dim target = .Value.SymbolicLinkTarget
...

WalterLederer avatar Sep 09 '20 07:09 WalterLederer

@WalterLederer

If I was trying to grok code with something like For Each foo In bar.WithIndex("Blah"), I'd be more surprised to find out that "Blah" is meant as a tag of no real use to WithIndex, rather than, say, some sort of key or filter which WithIndex might use for itself.

I would also wonder what's so darn difficult about UpdateProgressPercent(title:="Check", inx:= .Index, maxInx:= .MaxIndex) in the first place? If keeping track of the appropriate maxInx was a big worry, then why not instead use a pattern like UpdateProgressPercent(d, title:="Check")? Or instead, why not package the title and maxInt within a Tuple or a lamba and call the original UpdateProgressPercent?

Transforming something generic like Structure IndexStructure_(Of T) into something more case-specific, like "oh, just in case we want to go update a progress indicator or who-knows-what-else, let's go add a Title field to it" is how software bloat and "Magic Programming" happens (https://en.wikipedia.org/wiki/Magic_(programming)).

Speaking as someone who's job it is to work on other people's code at least 99.94% of the time: One must be careful about what seems like a good concise approach isn't instead a conceit of "Code Golf" (https://en.wikipedia.org/wiki/Code_golf) or "Magic Programming". If you think I'm wrong on this, that's OK, but I'd recommend asking for feedback on software degign pattern proposals from colleagues and/or at https://codereview.stackexchange.com/.

You may want to check out these for more ideas/perspectives: https://www.gofpatterns.com/design-patterns/module1/intro-design-patterns.php, https://docs.embold.io/anti-patterns/, https://stackoverflow.com/questions/365615/in-net-which-loop-runs-faster-for-or-foreach.

rskar-git avatar Sep 13 '20 15:09 rskar-git

let's go add a Title field to it

I agree, it would be better to have a special .WithProgress function for this special case.

WalterLederer avatar Sep 13 '20 19:09 WalterLederer