wren icon indicating copy to clipboard operation
wren copied to clipboard

How to validate arguments for foreign methods from native code?

Open heretique opened this issue 7 years ago • 20 comments

Hi,

I was wondering if it is possible to validate arguments for foreign methods from within the native foreign method. From this thread I've seen that one way of doing it is to have the checks from wren using the "is" keyword to makes sure the arguments is of the expected type. But this approach means creating a lot of wrapper methods over the actual foreign ones (polluting the script and they cannot be private as far as I know). For example if I have a Vec3 class:

foreign class Vec3 {
    construct new( x, y, z ) {}
    foreign *( rhs )
    foreign +( rhs )
    foreign -( rhs ) 
} 

I would like to make sure "rhs" is a Vec3 before I cast it to the native vector class in the foreign operator implementation. I could check that the slot for the argument contains a foreign type but can't say for sure it's a Vec3 or Vec2 and I could have a lot of problems casting to the wrong type. Does this make sense?

heretique avatar Dec 27 '17 21:12 heretique

I don't think currently there's an easy answer.. You could check for a failed cast obviously, or do some kinda variant tagging of your foreign but they're all awkward.

The interesting thing is: what would wren return if you had a "more details please" version of wrenGetSlotType? I don't assume you'd want to be calling strcmp for example on every vector operator... so we'd want a primitive.

One idea could be something like, wrenSetSlotNewForeign has a tag int type, ObjForeign would hang onto it, then wrenGet(Foreign?)SlotType could return that tag (and error if it's not a WREN_TYPE_FOREIGN).

ruby0x1 avatar Dec 27 '17 22:12 ruby0x1

I don't assume you'd want to be calling strcmp for example on every vector operator... so we'd want a primitive.

Just noting that this is exactly how Lua does it with luaL_checkudata. But I would agree that a tag-based system would be preferable. There would probably need to be some kind of type registration or lookup though, since you need to make sure those tags are unique across modules.

bjorn avatar Dec 27 '17 23:12 bjorn

Just noting that this is exactly how Lua does it

Interesting. Ideally the check would also be opt out (like "not in release builds").

There would probably need to be some kind of type registration or lookup though

This may not be necessary, if the tag is a hashed string for example, you only pay the hash cost once - and can even pay it at compile time. The wren side doesn't/shouldn't make the distinction, it would just be an opaque uint32_t to the vm. This was the original intent of the tag.

ruby0x1 avatar Dec 27 '17 23:12 ruby0x1

Ideally the check would also be opt out (like "not in release builds").

Indeed.

This may not be necessary, if the tag is a hashed string for example, you only pay the hash cost once - and can even pay it at compile time. The wren side doesn't/shouldn't make the distinction, it would just be an opaque uint32_t to the vm. This was the original intent of the tag.

This is something I was considering as well since I found this compile time murmur hash. The only thing remaining then, would be finding ways of dealing with hash collisions but that shouldn't be wren's concern I guess.

heretique avatar Dec 28 '17 08:12 heretique

we could also add optional typing. e.g.:

foreign class Vec3 {
    construct new( x, y, z ) {}
    foreign *( Vec3 rhs )
    // or
    foreign +( rhs : Vec3 )
    // or something else
}

minirop avatar Dec 28 '17 21:12 minirop

I agree. Perhaps the best syntax is the existing is keyword?

foreign class Vec3 {
    foreign * ( x is Num )
    foreign + ( x is Vec3 )
}

KyleCharters avatar Dec 29 '17 08:12 KyleCharters

I wonder what's @munificent's take on this?

heretique avatar Dec 29 '17 10:12 heretique

You can emulate the "tag" approach by having a class containing a tag and a pointer to the actual object, and then just storing that in your foreign slot instead.

xSke avatar Jan 24 '18 23:01 xSke

From this thread I've seen that one way of doing it is to have the checks from wren using the "is" keyword to makes sure the arguments is of the expected type.

Yes, that's what I usually do.

But this approach means creating a lot of wrapper methods over the actual foreign ones (polluting the script and they cannot be private as far as I know).

There is a convention that methods ending in "_" are private. But, yes, there is a layer of wrappers that can feel annoying.

If you want to do the validation inside C code, we could add something like this to the C API:

bool wrenIsInstance(WrenVM* vm, int objectSlot, int classSlot);

That would return true if the object in objectSlot is an instance of the class in classSlot and false otherwise. Then you'd load the desired class (which you've probably looked up earlier and stored in a WrenHandle) into the slot, call this, and then report an error if it's not the class you want.

Thoughts?

we could also add optional typing. e.g.:

I have a lot of thoughts about this. :) I have designed an optionally typed language and more recently worked on a language as it migrated from optional to real static typing.

My experience from Dart is that it doesn't really work to go halfway towards types. If we were to add type annotations to Wren, then people would want to annotate "This method takes a list." Before long, they'd want to say "This method takes a list of cats." Now you need generics. In order to check that annotation at runtime, list objects would need to know what their type argument is, so you're talking reification. That means you need to handle higher-order methods on collections also producing lists with the right reified type argument, so now you need generic methods. Then they'd want to be able to pass a list of animals to that method, so now you're talking variance.

If you don't go all the way and do a full type system with generics, generic methods, variance, etc. you end up with something like Go where people are annoyed by the lack of expressiveness. And if you do go all the way, you get a language 10x more complex than Wren is today. That complexity can be worth it when programming in the large. But for the relatively small programs Wren is designed for, it may be the wrong trade-off.

munificent avatar Mar 26 '18 15:03 munificent

Then you'd load the desired class (which you've probably looked up earlier and stored in a WrenHandle) into the slot

Only pain point I feel here is "when", when do you fetch the handle/s? And do those modules exist yet? This also requires code overhead outside of the allocate/usage pattern for foreign types, but for plain foreign functions they also don't necessarily have a clear place to do that work.

Then there's the question of where to store the handles, the easiest is using static file local loose variables (or function local static variables), but since a foreign has no "context" it doesn't necessarily have access to stuff that isn't global, which encourages making the handles global, or "file global", bad if have code across multiple files, or fetched from the VM user data.

These may not matter much, but they are still raised by wrenIsInstance and it bothers me a bit. Also probably not ideal but another way:

wrenIsInstance(WrenVM* vm, int objSlot, const char* module, const char* class)

The vm can grab handles and cache them if it's inclined, but since it's the vm, it doesn't need the overhead of the handle and can do the check more directly (maybe). This has its own can of downsides like strings all over the place (solvable with defines etc).

I still like the idea of a simple opaque uint32 (or 64) tag on a foreign that the vm ignores but the user can use. It's a primitive type, slots into existing code in foreign allocate, is optional and has little overhead on code on the vm or the user side, and is a primitive so adds no real overhead for checking the type.

ruby0x1 avatar Mar 26 '18 16:03 ruby0x1

I almost agreed also with the

bool wrenIsInstance(WrenVM* vm, int objectSlot, int classSlot);

but after reading @underscorediscovery 's response I realized I didn't quite understood the implications and I had to remove the response. I also mistakenly took "int classSlot" param for the opaque tag on the foreign type that was previously discussed. What I ended up using was exactly what @xSke proposed, foreign objects wrapped together with a type tag with the use of templates. I still don't understand how "classSlot" would help.

heretique avatar Mar 26 '18 18:03 heretique

the classSlot approach is sort of like:

    //store SomeClass type into slot 1
wrenGetVariable(vm, "module", "SomeClass", 1);
    //get the handle to the class
WrenHandle* handle = wrenGetSlotHandle(vm, 1);
    //save somehow for later
....
    //when validating
int classSlot = 2;
int myArgSlot = 1;
wrenSetSlotHandle(vm, 2, handle); //from before
assert(wrenIsInstance(vm, myArgSlot, classSlot));

You have to do a bit of dancing to not clobber your args as well, but that's the intent as I understood it.

ruby0x1 avatar Mar 26 '18 19:03 ruby0x1

if this is the case there's indeed the risk of writing over slots that contain my method arguments if not careful. Also looks a bit more convoluted to use. I still love the custom opaque tag idea more. Also usually you already have some tag already around used by serialization or some reflection implementation.

The slot approach most probably will use the class name string in the ObjClass structure to make the comparison (or could be the actual ObjClass pointer?) so it will be slower than comparing the optional tag that could be added to ObjClass. I'm mentioning this because I don't believe this check should be disabled on release builds as previously suggested. If I release an engine that has Wren integration, I definitely want to do the checks for all my foreign classes method calls because users could pass wrong arguments and bring the engine down with no clue on what just happened. Currently I'm pretty happy with my wrapped object-tag implementation so I won't push too hard with this. Anyway, this is how I would see your proposal used @underscorediscovery

void NativeVec3DotProductImpl(WrenVM* vm)
{
    uint32_t typeTag = 0;
    Vec3* instance = static_cast<Vec3>(wrenGetSlotForeign(vm, 0));
    // here I assume I get another Vec3 to do the product but instead I get a Vec2
    // so here I should validate before casting, I'm doing dangerous assumptions
    // that I get what I want from wren
    typeTag = wrenGetSlotForeignTag(vm, 1);
    // validate and abort if needed
    Vec3* other = static_cast<Vec3>(wrenGetSlotForeign(vm, 1));
    ....
}

heretique avatar Mar 27 '18 06:03 heretique

I was thinking along the lines of wrenGetSlotForeign(WrenVM* vm, int slot, uint32_t* out_tag) for the original suggestion.

uint32_t typeTag = 0;
Vec3* instance = static_cast<Vec3>(wrenGetSlotForeign(vm, 0, &typeTag));

Because of it being specific to foreign, I would store the uint32_t tag in ObjForeign.

This might mean that class CustomClass as an arg can't be validated this way from the C api still, but we also don't have much in the way of interacting with those types in any case - largely because we can't use wrenCall and ask it questions atm (reentrant) and there is no other api to use on the arg as a type - so it might not matter as much.

Bob's suggested class slot is call is a good basis for that case but I still think it suffers from usability and would try to streamline it as an api.

ruby0x1 avatar Mar 27 '18 11:03 ruby0x1

Only pain point I feel here is "when", when do you fetch the handle/s? And do those modules exist yet? This also requires code overhead outside of the allocate/usage pattern for foreign types, but for plain foreign functions they also don't necessarily have a clear place to do that work.

Yeah, that's a good point.

Also probably not ideal but another way:

wrenIsInstance(WrenVM* vm, int objSlot, const char* module, const char* class)

The vm can grab handles and cache them if it's inclined, but since it's the vm, it doesn't need the overhead of the handle and can do the check more directly (maybe).

I try to minimize the number of string-based C API functions because working with strings and map lookup is pretty slow. This would effectively be equivalent to your example of using the slot-based wrenIsInstance() except that this gives you know way of hoisting the string-based lookup out of any hot loops.

But your point about clobbering args is a good one. My suggestion is probably too annoying to use in practice.

Because of it being specific to foreign, I would store the uint32_t tag in ObjForeign.

I'm hesitant to add overhead like this to every ObjForeign in order to support a policy that might not be useful for some embedders. The VM tries to be as policy-free and lowest-common-denominator as possible because any overhead it adds is overhead all users of the VM cannot escape.

Assuming you trust all of the Wren code your host app is running, you could always implement this policy at the host app level, right? You could just assume that all foreign objects start with a type tag inside their custom memory and check for that yourself.

Hmm, now that I think about it...

Putting a type tag in each ObjForeign would potentially waste a lot of memory because you may have many many tiny foreign objects. But ObjClass is already pretty heavyweight, and each ObjForeign already needs to store a pointer to its class.

We could add numeric IDs to ObjClass and then provide a C API function to get an object's class ID. You could then check that to see if a foreign object has the class you expect. There's the question of whether the host app chooses class IDs or the VM generates them. And it kind of feels like a somewhat arbitrary policy to put into the VM, but it's an idea.

munificent avatar Mar 27 '18 14:03 munificent

We could add numeric IDs to ObjClass and then provide a C API function to get an object's class ID. You could then check that to see if a foreign object has the class you expect. There's the question of whether the host app chooses class IDs or the VM generates them. And it kind of feels like a somewhat arbitrary policy to put into the VM, but it's an idea.

Just curious, but why couldn't you simply compare the ObjClass pointers rather than going through an ID?

bjorn avatar Mar 27 '18 15:03 bjorn

Just curious, but why couldn't you simply compare the ObjClass pointers rather than going through an ID?

The pointers aren't exposed to the C API and we don't want to expose raw pointers because doing so would prevent us from ever supporting a moving/copying GC in the future. (That's also why WrenHandle exists, more or less.)

We could do something like:

bool wrenIsClass(WrenVM* vm, int slot, WrenHandle classHandle);

That would return true if the class of the object in [slot] is the same as the class stored in classHandle.

munificent avatar Mar 27 '18 17:03 munificent

I might be late to the conversation but here it goes:

I have came up with a workaround. It only works if you use C++11 or higher, and use typeid. Suppose you want to bind a Vector3 class and then verify that the void* from wrenGetSlotForeign(...) is actually a valid pointer to Vector3, you could do the following:

// Create a dummy class which all of your classes will use
class Object {
  public:
    Object() = default;
    virtual ~Object() = 0;
};
Object::~Object() {}

// Create your class that inherits the Object
class Vector3 : public Object {
  public:
    Vector3(float x, float y, float z) { ...}
    virtual ~Vector3() = default;

    float x, y, z;
};

And when validating the arguments, simply do the following:

auto object = reinterpret_cast<Object*>(wrenGetSlotForeign(vm, ...));
if (typeid(*object) == typeid(Vector3)) {
    auto vector = static_cast<Vector3*>(object);
    // We have a Vector3
} else {
    // The slot is not a Vector3
}

It works because using a typeid() on dereferenced base class pointer yields the same typeid hash code as for the derived class. But, the base class must be polymorphic, thus the pure virtual destructor, and all classes you use within Wren must inherit this same class.

With some templating magic, one can create a simple wrapper for this.

It is ugly, but it is fast, and it works.

matusnovak avatar May 20 '19 22:05 matusnovak

Edit I have opened #959 for this specific issue.

I would also appreciate a type tag on classes. I am working on Vala bindings for Wren. Unless I am missing something, the allocate function receives no information identifying the class. Therefore, each class needs its own allocate() --- and thus each class must have a unique C callsite. Callsites are hard to create dynamically in compiled languages :) . My current workaround is a fixed list of stub functions I dynamically assign to Wren classes --- functional, but very repetitive and ugly (here et seq.).

If classes had type tags, I could use one allocate() in Vala, and switch on the type tag to determine what type to create. E.g.:

void generic_allocate(WrenVM vm) {
    var typetag = wrenGetTypeTagForSlot(vm, 0);
    Object *instance;
    switch(typetag) {
        case SOMETYPE: instance = new SomeType();
        ...
    }
    var ppObject = wrenSetSlotNewForeign(0,0,sizeof(Object*));
    *ppObject = instance;
}

Edit I like @xsKe's https://github.com/wren-lang/wren/issues/498#issuecomment-360308784 , but I can only use it after I have called wrenSetSlotNewForeign. It can't help me decide which foreign object to create.

cxw42 avatar Mar 28 '21 01:03 cxw42

Very nice discussion, I'd like to add that I would need both features:

  1. tag for validating types efficiently
  2. string name for error reporting, ok if it's slow; specifically I'm in a situation similar to issue #697 and I want to report the name of the wren class that caused the error. Of course having the module name too would be nice!

Regarding the hash discussion, that's not a huge issue at least in modern c++. I have a build time implementation of the Tiger hash here and one (a bit old) of CRC32 here. Interestingly, this one is already part of my c++ wrapper for wren ;) and I should be able to embed the whole tagging and type safety mechanism in it without any extra effort for clients of my library.

KingDuckZ avatar May 15 '22 18:05 KingDuckZ