vblang
vblang copied to clipboard
[Proposal] For Each ByRef
I suggest to allow iterating through collections by ref. like this:
Dim A = {1, 2, 3}
For Each ByRef item in A
item += 1
Next
after executing, A = {2, 3, 4}
What problem does this solve?
@pricerc
Obviously, the given example is way too trivial to illustrate the need, and can already be done efficiently in today's VB, like so:
Dim A = {1, 2, 3}
For itemIndex = 0 To A.Length - 1
A(itemIndex) += 1
Next
But, in a more demanding situation, the ability to maintain a reference to a Structure (a.k.a. .Net "value") type provides two main benefits: (1) A means to mutate the actual instance (rather than its copy, which is the usual idiom for .Net value types); and (2) a means to reduce the copying of value types, which helps reduce memory pressure for large sized (over 16 bytes) value types.
C# has long had something called "ref locals" for that sort of thing. Meanwhile, in VB land, we are denied the equivalent of a C# ref local (I suggested Dim ByRef - https://github.com/dotnet/vblang/issues/34).
The use of ref locals happens often enough when iterating through arrays of struct, so an extended foreach was introduced in C# 7.3 (https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/ref#ref-locals).
Anyway, in today's VB, the closest emulation possible is by way of the With statement, or ByRef parameters in helper functions or lambdas.
E.g. closest emulation possible of @VBAndCs example:
<Extension> Function Indexes(Of T)(a As T()) As IEnumerable(Of T)
Return System.Linq.Enumerable.Range(0, a.Length)
End Function
Dim A = {1, 2, 3}
For Each itemIndex In A.Indexes()
Call Sub(ByRef item As Integer)
item += 1
' And whatever other stuff to do ...
End Sub(A(itemIndex))
Next
For what it's worth, this is possible in VB today:
Dim A = {1, 2, 3}
With A.AsSpan().GetEnumerator()
Do While .MoveNext()
Call Sub(ByRef item As Integer)
item += 1
End Sub(.Current)
Loop
End With
It does not confer any speed advantage for integer arrays (it's actually a tad slower versus indexing from For i = 0 to A.Length - 1), but maybe in other scenarios it might work better. (In C#, using foreach (ref int x in A.AsSpan()) is only slightly faster.)
Note that one can use For Each on a Span, like so:
For Each item In A.AsSpan()
WriteLine($"{item}")
Next
However, item will be a copy of each array element; but again, maybe there are some scenarios where it might work nicely. Slice could be fun and useful:
For Each item In A.AsSpan().Slice(1, 1)
WriteLine($"{item}")
Next
This is possible in VB today in For Each loop:
Public Sub EnumerableWithIndex_String_Test()
Dim vals = {"A", "B", "C", "D"}
For Each e In vals.WithIndex
With e
Dim out = Sub() ConsoleLine($"{ .Index}, { .Value}")
If .IsFirst Then
out() : Assert.Ok(.Index = 0 And .Value = "A")
.MoveNext()
out() : Assert.Ok(.Index = 1 And .Value = "B")
ElseIf Not .IsLast Then
out() : Assert.Ok(.Index = 2 And .Value = "C")
.Value = "C1" ' CHANGE VALUE
out() : Assert.Ok(.Index = 2 And .Value = "C1")
Else
out() : Assert.Ok(.Index = 3 And .Value = "D")
End If
End With
Next e
End Sub
Public Sub EnumerableWithIndex_Integer_Test()
Dim vals = {1, 2, 3, 4, 5}
For Each e In vals.WithIndex
With e
Assert.Ok(.Index + 1 = .Value)
.Value += 1 ' CHANGE VALUE
Assert.Ok(.Index + 2 = .Value)
End With
Next e
End Sub
<Extension>
Public Iterator Function WithIndex(Of T)(
source As IList(Of T)
) As IEnumerable(Of Index_(Of T))
Using enumerator = source.GetEnumerator
Dim hasNext = enumerator.MoveNext
Dim index = -1
While hasNext
Dim wi = New Index_(Of T)(index:=index, source:=source, enumerator:=enumerator)
wi.MoveNext()
Yield wi
hasNext = Not wi.IsLast
index = wi.Index ' if .MoveNext was used
End While
End Using
End Function
Public Class Index_(Of T)
Private _value As T
Public Property Value As T
Get
Return _value
End Get
Set(value1 As T)
_value = value1
_source(Index) = _value
End Set
End Property
Public ReadOnly Property Index As Integer ' first element has index = 0
Public ReadOnly Property IsLast As Boolean
Private ReadOnly _enumerator As IEnumerator(Of T)
Private ReadOnly _source As IList(Of T)
Public Sub New(
index As Integer,
source As IList(Of T),
enumerator As IEnumerator(Of T)
)
Me.Index = index
_source = source
_enumerator = enumerator
End Sub
Public Function IsFirst() As Boolean
Return Index = 0
End Function
Public Sub MoveNext(Optional n As Integer = 1)
For i = 1 To n
_value = _enumerator.Current
_IsLast = Not _enumerator.MoveNext()
Threading.Interlocked.Increment(_Index) ' may be called with .AsParallel
Next i
End Sub
End Class
So, You wrote about 100 lines to do what I am suggesting to do by adding the 5 letters ByRef keyword :D
I am not trying to do things. I am trying to do them in the smartest way, so, we can reclaim our historical strong easy smart VB.
So, You wrote about 100 lines to do what I am suggesting to do by adding the 5 letters
ByRefkeyword :D I am not trying to do things. I am trying to do them in the smartest way, so, we can reclaim our historical strong easy smart VB.
The 100 lines are kept in a library and don't need to be typed in for every usage. If you strip away other features like Index and MoveNext it's not so big. I am sure the theoretical ByRef implementation in the VB compiler will also consume at least 100 lines of code - which you cannot modify easily.
Your suggestion (if new language features will get implemented?):
Dim A = {1, 2, 3}
For Each ByRef item in A
item += 1
Next
My workaround:
Dim A = {1, 2, 3}
For Each item in A.WithIndex
item.Value += 1
Next
And then you will hit the higher bad performance wall, especially that loops tend to be large. Again: I want vb to be once again the language where users type less, and get more.
And then you will hit the higher bad performance wall, especially that loops tend to be large.
“The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.”
quoted from Donald Knuth, "The Art of Computer Programming"
@WalterLederer That is a valiant try, but I'm with @VBAndCs on this, the WithIndex/Index_ concept is very memory heavy, especially in regard to (non-trivially sized) value-types. Also, it imposes two subtle risks.
Risk 1: It captures a copy of an element (in _value via IEnumerator.Current), so in scenarios where the actual value-type element is expected to be volatile, this will not work.
Risk 2: It presumes the (zero-based) ordinal of the elements yielded by IEnumerator Reset/MoveNext/Current has the same integer value as the actual index. That is often the case, but not always, and in those nonequivalent moments, assignments/mutations won't work as expected.
In terms of the Knuth quote, this is exactly the right time to review what's being done (or proposed) and why. What's being done is manipulation of (usually large) value-types, and the why of ByRef or "ref locals" is to greatly reduce memory usage (copies and storage), and also as a means to efficiently mutate such value-types. Knuth was warning about being too clever and too cute in one's approach to the problem. ByRef or "ref locals" doesn't represent a case of premature optimization, it is already a common long-time use-case.
concept is very memory heavy
Fixed, see below
Risk 1: It captures a copy of an element
Fixed, see below
Risk 2: It presumes the (zero-based) ordinal of the elements
This is not a problem with the used IList interface
<Extension>
Public Iterator Function WithIndex(Of T)(
source As IList(Of T)
) As IEnumerable(Of IndexStructure_(Of T))
Dim ic As New IndexClass_ ' use ref type to get changes to index with .MoveNext back to this loop
While ic.Index < source.Count - 1
ic.MoveNext()
Yield New IndexStructure_(Of T) With {.Ic = ic, .Source = source} ' use value type to avoid memory allocation
End While
End Function
Public Class IndexClass_
Public Property Index As Integer = -1 ' first element has index = 0
Public Sub MoveNext(Optional n As Integer = 1)
For i = 1 To n
Threading.Interlocked.Increment(Index) ' may be called with .AsParallel
Next i
End Sub
End Class
Public Structure IndexStructure_(Of T)
Public Property Ic As IndexClass_
Public Property Source As IList(Of T)
Public Property Value As T
Get
Return Source(Index)
End Get
Set(value1 As T)
Source(Index) = value1
End Set
End Property
Public Function Index() As Integer
Return Ic.Index
End Function
Public Function IsFirst() As Boolean
Return Index() = 0
End Function
Public Function IsLast() As Boolean
Return Index() >= Source.Count - 1
End Function
Public Sub MoveNext(Optional n As Integer = 1)
Ic.MoveNext(n)
End Sub
End Structure
[concept is very memory heavy] ... [It captures a copy of an element] ... Fixed, see below
Dropping the Private _value As T part does improve things. However, VB Property isn't by-reference (such as is possible in C#), hence elements will still be returned as copies. Even where Property Value is passed in to a ByRef parameter of a function/lambda call, there will still be a "defensive" copy operation involved; i.e. the apparent ByRef will operate instead as a by-proxy. Hence, there will be at least two copies of the element during the call - a real memory pressure issue for large Structure types.
The most straight-forward coping strategy is for Class Index_(Of T) to have some sort of method that will execute a Delegate that takes a ByRef to the element; such a method can then directly pass in Source(Index) (i.e. it does not use Property Value), which (hopefully) improves the chances of a bona fide by-reference pass. Although, I will admit, it might be that an element from an IList may still result in a pass-by-proxy situation. In which case, one may have to give up on IList and use Array instead.
[It presumes the (zero-based) ordinal...] ... This is not a problem with the used IList interface
Now that the Private ReadOnly _enumerator As IEnumerator(Of T) is dropped, and the index into the IList is directly used, I agree this revision does fix that.
a real memory pressure issue for large
Structuretypes.
From https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/choosing-between-class-and-struct :
"CONSIDER defining a struct instead of a class if instances of the type are small and commonly short-lived or are commonly embedded in other objects.
❌ AVOID defining a struct unless the type has all of the following characteristics:
It logically represents a single value, similar to primitive types (int, double, etc.).
It has an instance size under 16 bytes.
It is immutable.
It will not have to be boxed frequently.
In all other cases, you should define your types as classes."
In all other cases, you should define your types as classes.
Absolutely agree! Most of the time. In any case, a For Each scenario upon a collection of Class or Interface types would rarely benefit from a proposal like For Each ByRef - having a reference to an instance (which is what a Class or Interface value is) should be sufficient - and I still scratch my head about what use-cases would absolutely also need the index in that sort of situation, but whatever. Evenso, For Each ByRef is helpful for stepping through arrays; a small performance improvement for iteration and de-referencing, plus another chance to avoid a common error about the upper bound of the indexes. E.g., how often does one see a flub like For i = 0 To ThisArray.Length - you know, forgetting the - 1 after the .Length!
Anyway, the time may come where you must reckon with software that's not of your own creation, and they went with large structs. For whatever the reasons. In real life, such things must be happening in real live projects. Probably image processing or neural nets or Bayesian nets or some such. But it happens, and the C# folks love it!
And there are occasions where large structs + ref locals can be beneficial; Span(Of T) is the gateway here. Food for thought, on the C# side: https://docs.microsoft.com/en-us/archive/msdn-magazine/2018/january/csharp-all-about-span-exploring-a-new-net-mainstay, https://medium.com/@antao.almada/how-to-use-span-t-and-memory-t-c0b126aae652, https://www.stevejgordon.co.uk/an-introduction-to-optimising-code-using-span-t.
Of course, the original concept for .Net was to make nearly exclusive use of managed-code classes, since that provided the best balance of performance, thread safety, and memory management (e.g. garbage collection). Yet even in the earliest days of .Net, there were recognized moments where working mostly on the stack with large structs made some sense (usually performance-wise) - e.g. https://stackoverflow.com/questions/5171781/when-to-use-pointers-in-c-net. Overtime, the desire for something safer than pointers lead to Span and other safer methods of direct access to managed and unmanaged instances of struct.
I still scratch my head about what use-cases would absolutely also need the index
Because I extended my example from #525 and did not remove the Index part
Span(Of T) is the gateway here. Food for thought, on the C# side
Any workarounds to use Span(Of T) also in VB.NET?
@WalterLederer
Because I extended my example...
There were folks who indicated interest in #525, but if anyone demonstrated a compelling use-case there, then I missed that. The only reason I can see to want a bona fide index (e.g. into Array or List or IList) is for the mutation of value-type elements. In which case, you and I and many others have already shown that to be quite do-able in VB today. (Not to mention, the general advice to favor Class over Structure anyway...)
Any workarounds to use Span(Of T) also in VB.NET?
Absolutely! Check out the first few posts at https://gitter.im/VB-NET/HowTo (look for "ForEachByRef").
demonstrated a compelling use-case there
Comparing multiple List of strings where the first list should be transformed/formatted to the second one by a function which takes a list of string as parameter. Then checking the result:
For Each l In expectedLines.WithIndex
Dim actLine = actualLines(l.Index)
Dim expLine = expectedLines(l.Index)
If expLine <> actLine Then
...
@WalterLederer The For Each l In expectedLines.WithIndex example is a situation where index is a key field which relates items in actualLines with respective ones in expectedLines, and for something like that using the "old-school" For index = 0 to expectedLines.Count - 1 makes better sense. There's no reason to be using For Each here, except maybe as a means to get the indexes via light-weight helper functions like:
<Extension> Function Indexes(Of T)(a As T()) As IEnumerable(Of Integer)
Return Enumerable.Range(0, a.Count)
End Function
<Extension> Function Indexes(a As IList) As IEnumerable(Of Integer)
Return Enumerable.Range(0, a.Count)
End Function
They help avoid the "Count" vs. "Count minus one" bugs, which is useful:
For Each index In expectedLines.Indexes
Dim actLine = actualLines(index)
Dim expLine = expectedLines(index)
If expLine <> actLine Then
' ...
<Extension> Function Indexes(Of T)(a As T()) As IEnumerable(Of Integer)
Ok, thanks, I have changed my code to use .Indexes in this case. But I've found a better example where .Index is needed to update a progress bar control:
For Each d In links.WithIndex
With d
UpdateProgressPercent(title:="Check", inx:= .Index, maxInx:= .MaxIndex)
If .Value.IsSymbolicLink Then ' check for broken links
Dim target = .Value.SymbolicLinkTarget
...
This can be simplified by passing IndexStructure_(Of T) to UpdateProgressPercent. This avoids completely specifying index and maxindex explicitely.
For Each d In links.WithIndex("Check")
With d
UpdateProgressPercent(d)
If .Value.IsSymbolicLink Then ' check for broken links
Dim target = .Value.SymbolicLinkTarget
...
@WalterLederer
If I was trying to grok code with something like For Each foo In bar.WithIndex("Blah"), I'd be more surprised to find out that "Blah" is meant as a tag of no real use to WithIndex, rather than, say, some sort of key or filter which WithIndex might use for itself.
I would also wonder what's so darn difficult about UpdateProgressPercent(title:="Check", inx:= .Index, maxInx:= .MaxIndex) in the first place? If keeping track of the appropriate maxInx was a big worry, then why not instead use a pattern like UpdateProgressPercent(d, title:="Check")? Or instead, why not package the title and maxInt within a Tuple or a lamba and call the original UpdateProgressPercent?
Transforming something generic like Structure IndexStructure_(Of T) into something more case-specific, like "oh, just in case we want to go update a progress indicator or who-knows-what-else, let's go add a Title field to it" is how software bloat and "Magic Programming" happens (https://en.wikipedia.org/wiki/Magic_(programming)).
Speaking as someone who's job it is to work on other people's code at least 99.94% of the time: One must be careful about what seems like a good concise approach isn't instead a conceit of "Code Golf" (https://en.wikipedia.org/wiki/Code_golf) or "Magic Programming". If you think I'm wrong on this, that's OK, but I'd recommend asking for feedback on software degign pattern proposals from colleagues and/or at https://codereview.stackexchange.com/.
You may want to check out these for more ideas/perspectives: https://www.gofpatterns.com/design-patterns/module1/intro-design-patterns.php, https://docs.embold.io/anti-patterns/, https://stackoverflow.com/questions/365615/in-net-which-loop-runs-faster-for-or-foreach.
let's go add a
Titlefield to it
I agree, it would be better to have a special .WithProgress function for this special case.