Beef icon indicating copy to clipboard operation
Beef copied to clipboard

[Discussion]Smart pointers

Open LoopyAshy opened this issue 2 years ago • 14 comments

I have been playing in beef since release and absolutely adore the concept and the updates constantly being added and bugs being fixed however in my opinion one thing that needs to be focused on regardless of the fact that: yes we have debug time memory leak checks, defer, and such, but considering people using this language are choosing to use it over C++ (which I am not a fan of but use when necessary) it would be wise to focus on features which C++ has implemented over the years to make the language more safe for the developers with little thought input.

One of the concepts as we all know is the smart pointer such as unique_ptr(Single owner, automatic deleting pointer wrapper), shared_ptr(Reference counted shared pointer.), weak_ptr(Weak reference to a shared pointer.), and their atomic versions (for thread safety.), all with their own purposes and use cases.

Even rust uses this system by default: Box<T>, Rc<T>, Arc<T>, and Weak<T> etc but enforces checks. Now I hear you say: just use defer or the scope keyword, now that sounds all fair and good (other than when you do more complex stuff where you have to keep track of your defers and scopes causing a lot of potential for errors on the dev's part.), until you get into storing the values globally or the main issue: multi threading. C++ allows you to pass the smart pointers by value basically giving the ownership away to another scope or thread which naturally is extremely useful and desirable but from what I observed is basically impossible to replicate within beef due to factors such as not allowing structs to have destructors and classes being fiddly to pass between function scopes to put it lightly.

Which comes to final difficult question: could it be possible to pass a class over by value, now yes I am aware the class can be either created via scope or new and their whole purpose is to be a reference type, but what if there was a way to implicitly handle them as a value and reconstruct them at callers side. This would potentially come with some degree of a performance cost unless the compiler does some extra work to handle this use case.

Finally a poor but clear example of this being a potential issue:

static void create_ptr_test() 
{
	var ptr = new TestClass(1, 2, "Kebab");
	//defer delete ptr; //this would create undefined behavior very quickly in this example.

	off_thread_increment(ptr);
	Console.WriteLine(input.c);
}

static void off_thread_increment(TestClass input) 
{
	new System.Threading.Thread( new () => {
		//defer delete input; //potentially not wanted
		//Could be fixed using a bool argument to basically say it is ok to delete when it is done.
		//Such as if (auto_delete) { defer:: delete input; } however I feel this is still an unnecessary conscious decision and requires mental memory tracking in larger projects.

		Thread.Sleep(10000);
		Console.WriteLine(input.c);
	}).Start();
}

class TestClass 
{
	public int a;
	public int b;
	public String c = new String() ~ delete _;

	public this(int a, int b, StringView c)
	{
		this.a = a;
		this.b = b;
		this.c.Append(c);
	}

	public ~this() {
		Console.WriteLine("Deleted TestClass.");
	}
}

Now if for example we had lets say: destructors within structs then this could be possible (which in my honest opinion is much safer):

static void create_ptr_test() 
{
	var ptr = UniquePointer<TestClass>(scope () => { return new TestClass(1, 2, "Kebab"); });

	off_thread_increment(ptr.move()); //if copy operator is overloadable however .move wouldn't be needed but I kind of prefer the explicitness
	//Can no longer use ptr as it is nulled.
}

/// Basically takes ownership of the pointer and deletes it when it is done.
static void off_thread_increment(UniquePointer<TestClass> input) 
{
	new System.Threading.Thread( new () => {  //could also potentailly add something like [move] or [input] to copy the value into scope straight away (possibly already does this).
		var data = input.move(); //otherwise this would be required (or since copy operator would be overloaded: just var data =input; )
		Thread.Sleep(10000);
		Console.WriteLine(*data); //would require overloadable deref operator
		//pointer is dropped.
	}).Start();
}

struct UniquePointer<T>
{
	private alloctype(T) inner;
	
	public this(delegate alloctype(T)() creator)
	{
		inner = creator();
	}

	public ~this()
	{
		if (inner != null)
			delete inner;
	}

	public operator* () //This is also not possible as you cannot as far as I am aware: overload the deref operator.
	{
		return inner;
	}

	public Self Copy() //not possible as you cannot overload/disable implicit copying.
	{ 
		return this.move();
	}
	
	public UniquePointer<T> move() 
	{
		var output = UniquePointer<T>() { inner = this.inner };
		inner = null;
		return output;
	}
}

class TestClass 
{
	public int a;
	public int b;
	public String c = new String() ~ delete _;

	public this(int a, int b, StringView c)
	{
		this.a = a;
		this.b = b;
		this.c.Append(c);
	}

	public ~this() {
		Console.WriteLine("Deleted TestClass.");
	}
}

As you can read this does require several abilities (all of which are already available within c++ and rust) such as:

  • capturing by value into delegates (potentially already possible)
  • struct deconstructors
  • implicit copy overloading/disabling
  • dereference operator overloading
  • preferably compiler support to recognise when a unique pointer is moved and avoids calling the deconstructor to avoid performance cost of checking if null (this is very wishful thinking but definitely possible)

This has been a long post and I apologise for my grammar and punctuation.

LoopyAshy avatar Oct 25 '21 00:10 LoopyAshy

I guess it's this way by design. Quote from README:

The syntax and many semantics are most directly derived from C#, while attempting to retain the C ideals of bare-metal explicitness and lack of runtime surprises

But I agree, the lack of something like smart pointers is a huge pain. It's possible to use the System.RefCounted base class, but then you have to remember to AddRef and ReleaseRef.

disarray2077 avatar Oct 25 '21 00:10 disarray2077

I guess it's this way by design. Quote from README:

The syntax and many semantics are most directly derived from C#, while attempting to retain the C ideals of bare-metal explicitness and lack of runtime surprises

But I agree, the lack of something like smart pointers is a huge pain. It's possible to use the System.RefCounted base class, but then you have to remember to AddRef and ReleaseRef.

That is true: not to mention also that it isn't a wrapper but an abstract class which is not always desirable due to requiring inheritance.

LoopyAshy avatar Oct 25 '21 00:10 LoopyAshy

There's certainly some better ways to do more ergonomic ref counting. I'm thinking we could have a ref-counting generic wrapper class.

  • We could allow overriding a * unary operator, (which could also be triggered with a -> access)
  • We could use comptime to add constructors to the wrapper which copy the ctors available in the wrapped class, and then allocate the wrapped data with append allocations. This way var str = new Rc<String>(100); could allocate the rc wrapper, the string, and 100 characters of string storage all with one single heap allocation.

Copy constructors and the associated machinery have huge implications and would complicate (and cause a redesign) of many, many parts of the language - POD values are very core to the language design and philosophy.

bfiete avatar Oct 25 '21 14:10 bfiete

ALSO- with comptime there could even be an attribute that adds in ref counting data + methods to a class instead of deriving from System.RefCounted. Just another option.

bfiete avatar Oct 25 '21 14:10 bfiete

Interesting ideas, I could be seen as bias since I am a long term programmer of Rust and C#, and there is a lot of concepts from other languages we could definitely take and make our own. Maximising the user friendliness of the language and (in my opinion) very importantly: safety.

Thankfully we have the leak checker and such built into the IDE which helps greatly but isn't flawless due to only noticing the mistake once the leak is detected (preferably devs want to avoid stuff like that with little thought).

One of C++'s biggest weaknesses isn't just its annoying badly aged rules such as header files and their large amount of mess due to backward compatibility but also its potential unsafety and the added complexity in larger projects due to the safety guards very often being absent (which definitely has it use but not always). C++ over the years definitely has done some good strides in making safe options and I respect them for it.

Rust was also was built on that concept of safety and modern design and does it extremely well, but it definitely is extremely different syntonically to the traditional C languages which makes it difficult to learn for newer developers (along with the borrow checker) and another issue with rust is definitely porting C code to it or even writing wrappers, which in turn creates a large dent in the use by more traditional developers.

Beef does not suffer from this and is part of the reason I genuinely enjoy beef and want it to succeed further.

I realise this post was mostly just me rambling but I do believe this should be one of the main focuses of the language as it grows (learning from the success of other languages and taking ideas), on this note I do hope to one day see us supporting a system similar to rust's cargo, which helps in downloading libraries or 'crates' and automatically placing them within the project (installing libraries in c++ can definitely be a headache at times).

LoopyAshy avatar Oct 25 '21 23:10 LoopyAshy

For me, Beef's biggest advantage is that it can be directly compiled into binary. As for the garbage collector, I think it should not be repelled. Perhaps at least the bottom layer should retain the relevant interface so that in the future, it can be added if needed. Don't feel repelled by garbage collection, because both ue4 and Unity3d implement garbage collection on C++,and they do good work.

sgf avatar Dec 22 '21 01:12 sgf

For smart pointers, this is a simplified version of the garbage collector. For C++ smart pointers, the biggest problem is that it has a certain impact on the grammar, causing some noise in the grammar, especially when smart pointers are nested.

sgf avatar Dec 22 '21 01:12 sgf

I have an initial implementation of a better ref counting solution, along the lines I described at https://github.com/beefytech/Beef/issues/1169#issuecomment-950958673

RefCounted<String> rcStr = .Create(1024);
defer rcStr.Release();
...
rcStr->Append("Abc");

RefCounted<String>.Create(1024); performs a single heap allocation which contains the RefCounted<String> object, append-allocating a String with 1024 bytes of storage.

bfiete avatar Jun 22 '22 16:06 bfiete

So it is still manual then? requiring us to add and release references ourselves?

Redhacker1 avatar Aug 02 '22 15:08 Redhacker1

Automatic reference counting requires either deep holistic language support (ie: Swift/Python) or it requires certain C++ language features (copy constructors, assignment operator overloads, move constructors, etc) so that they can be implemented as a library.

I do not consider either of those tradeoffs to be the correct choices for BeefLang.

So, yes- manual, if you want to do ref counting. Generally you just design around it and avoid reference counting, of course - like you generally don't use reference counting in C.

bfiete avatar Aug 03 '22 02:08 bfiete

That is totally understandable and I respect that.

With that being said it, sounds like you should just close the issue and mark as won't fix, this doesn't seem to be what either of them asked for, it's better, but it is not an automatic system, not to mention, the complaint above about requiring inheritance was already previously solved with the IRefCounted interface. If such a solution for automatic memory management is not an option (which is why most of the time people ask for smart pointers or refcounting, especially when an interface for manual refcounting already exists in BCL and they previously expressed concern with the manual nature. Not to mention calling current memory management a huge pain even considering it) then this issue should probably be marked as such, at least for the time being. It is relatively high on the issue list.

No shame in not being a feature that is right for beef, but probably best to just say not in the plan. Sorry for responding so late.

Redhacker1 avatar Aug 03 '22 17:08 Redhacker1

I wanted to leave the issue open as a reference to others who may be interested in this discussion in the future.

Also, the one who called the lack of smart pointers a "huge pain" is actually a top BeefLang contributor, and I do think a frank discussion about the downsides of design decisions is a good thing. I've seen alt languages making claims of basically "this is the proper way to program" which is either delusional or disingenuous.

Maybe someone will come along and read this and have some additional ideas on how to improve ref counting in a way that fits within our "tradeoff space".

bfiete avatar Aug 04 '22 12:08 bfiete

Nevermind, I just realized a few problems with this approach

Redhacker1 avatar Aug 04 '22 16:08 Redhacker1

It would certainly be a massively powerful comptime system that would allow that! Fundamentally, though, the biggest issue is is treating all composite data as POD - you could no longer just memcpy composite data around. Any assignment of a composite may require a deref of old field references and a ref of the new ones. Popping an element from a List<T> is no longer a simple mSize-- - it may also require a deref of field references... etc..

There's a bunch of other stuff like good escape analysis to eliminate unnecessary refs/releases, but that's just work. The semantic implications are the real issues.

bfiete avatar Aug 04 '22 18:08 bfiete