rdotnet ProtectedPointer usage

I see many SymbolicExpression are creating ProtectedPointer on themselves. If I read R manual correctly (link), Preserved objects are already immune to R's gc, so protecting them again is unnecessary. And ProtectedPointer calls Rf_unprotect_ptr, which also traverse an internal list just as R_ReleaseObject does. If the R.Net functions are synchronized, we can chage it to Rf_unprotect, which is much more efficient.

I think I can start to work on these issues. Of course this will require some code refactoring, but I will try to keep the library interface unchanged. @jmp75 , @skyguy94 , what do you think?

Jan 04 '15 14:01 hyh

I have pretty detailed responses for these, but they should be opened as separate issues to make the discussion easier.

Jan 04 '15 15:01 skyguy94

Ok, I pulled the first two into separate issues. I've left this thread for ProtectedPointer, if you could edit your issue, I think that would be helpful. I've been concerned about the use of ProtectedPointer, but modifying R.Net to support [https://github.com/skyguy94/rdotnet/tree/ClientServerPrototype](out of process clientserver) was my first foray into the R API and R.Net.

I believe the relevant comment from the R Internals document is:

There is another way to avoid the affects of garbage collection: a call to R_PreserveObject adds an object to an internal list of objects not to be collects, and a subsequent call to R_ReleaseObject removes it from that list. This provides a way for objects which are not returned as part of R objects to be protected across calls to compiled code: on the other hand it becomes the user’s responsibility to release them when they are no longer needed (and this often requires the use of a finalizer). It is less efficient that the normal protection mechanism, and should be used sparingly.

and an example from the code is: LogicalMatrix.cs:63

public override bool this[int rowIndex, int columnIndex]
{
   get
   {
      ...elided...
      using (new ProtectedPointer(this))
      {
         int offset = GetOffset(rowIndex, columnIndex);
         int data = Marshal.ReadInt32(DataPointer, offset);
         return Convert.ToBoolean(data);
... elided ...

In reading that comment and in examining the code, this seems like an appropriate use of protected pointer to me. We know that the top level list, vector, etc... is protected and the sub-elements receive protection via membership in that list. In this case, we are reading marshaled memory representing an element of that list and as such, R isn't involved in the operation and those protected pointers aren't necessary.

The other place R.Net makes heavy use of ProtectedPointer is during eval. REngine:665

private SymbolicExpression Parse(string statement, StringBuilder incompleteStatement)
{
   incompleteStatement.Append(statement);
   var s = GetFunction<Rf_mkString>()(incompleteStatement.ToString());
   string errorStatement;
   using (new ProtectedPointer(this, s))
   {
      ParseStatus status;
      var vector = new ExpressionVector(this, GetFunction<R_ParseVector>()(s, -1, out status, NilValue.DangerousGetHandle()));

      switch (status)
      {
         case ParseStatus.OK:
            incompleteStatement.Clear();
            if (vector.Length == 0)
            {
               return null;
            }
            using (new ProtectedPointer(vector))
            {
               SymbolicExpression result;
               if (!vector.First().TryEvaluate(GlobalEnvironment, out result))
               {
                  throw new EvaluationException(LastErrorMessage);
               }

               if (AutoPrint && !result.IsInvalid && GetVisible())
               {
                  GetFunction<Rf_PrintValue>()(result.DangerousGetHandle());
               }
               return result;
            }

I think this usage is pretty canonical and the R objects need to be protected in-between calls to the R API.

To summarize, I think the usage in the synthetic object model (the various C# representations of vectors, lists, and matrices) is incorrect, but the Eval usage is correct.

Jan 04 '15 16:01 skyguy94

About LogicalMatrix.cs:63,

We know that the top level list, vector, etc... is protected and the sub-elements receive protection via membership in that list.

They are already protected at the time when R_PreserveObject is called in "this" constructor, no need to call R_protect again.

REngine:665, Yes I agree that pointers returned from R API should be protected, this is canonical usage. But I think the first protection (on s) can be avoid, as long as we pass it into the second call immediately:

var s = GetFunction<Rf_mkString>()(incompleteStatement.ToString());
var vector = new ExpressionVector(this, GetFunction<R_ParseVector>()(s, -1, out status, NilValue.DangerousGetHandle()));

Protection is not needed for objects which R already knows are in use. In particular, this applies to function arguments.

SymbolicExpressionSymbols.cs:184 shows this pattern:

         IntPtr coerced = expression.GetFunction<Rf_coerceVector>()(expression.DangerousGetHandle(), SymbolicExpressionType.NumericVector);
         return new NumericVector(expression.Engine, coerced);

And what do you mean by the synthetic object model's problem? Sorry I didn't follow your previous discussions.

Jan 04 '15 18:01 hyh

And what do you mean by the synthetic object model's problem? Sorry I didn't follow your previous discussions.

Sorry, re-reading my comments I think I lost my train of thought. I would think that reading the marshaled memory doesn't require the use of ProtectedPointer, which is what I think you are referring to with this issue. I just wanted to confirm.

Yes I agree that pointers returned from R API should be protected, this is canonical usage. But I think the first protection (on s) can be avoid, as long as we pass it into the second call immediately:

Again, writing the client/server modification was my first spelunker into the R API. I believe that I tried to reduce the protected pointer usage there at one point, but I would get exceptions from time-to-time. That was pretty early one and it could have been a misunderstanding. I only ever debugged one issue using gdb and a modified debug build on Windows, so I never tracked down the exact call stack and issue. I agree that it shouldn't need protected if ParseVector is the next call. I imagine your threading code would mitigate any theoretical issues for multiuser race conditions.

Jan 04 '15 19:01 skyguy94

I hope the threading code mitigate your issues too. We can see if that works and come back to this later.

Jan 05 '15 02:01 hyh

rdotnet rdotnet copied to clipboard

ProtectedPointer usage

rdotnet
rdotnet copied to clipboard