root icon indicating copy to clipboard operation
root copied to clipboard

[tutorials] No `new` in v7 tutorials, please:

Open Axel-Naumann opened this issue 4 years ago • 11 comments

Thanks, Douglas Leonard!

Axel-Naumann avatar Feb 27 '21 11:02 Axel-Naumann

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14 How to customize builds

phsft-bot avatar Feb 27 '21 11:02 phsft-bot

Build failed on mac11.0/cxx17. Running on macphsft20.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build See console output.

Errors:

  • [2021-02-27T11:56:15.791Z] 1067/2194 Test #976: tutorial-v7-ntuple-ntpl001_staff ..................................................................***Failed Error regular expression found in output. Regex=[: error:] 0.90 sec
  • [2021-02-27T11:56:16.052Z] 1069/2194 Test #977: tutorial-v7-ntuple-ntpl002_vector .................................................................***Failed Error regular expression found in output. Regex=[: error:] 0.78 sec
  • [2021-02-27T11:56:16.684Z] 1070/2194 Test #978: tutorial-v7-ntuple-ntpl003_lhcbOpenData ...........................................................***Failed Error regular expression found in output. Regex=[: error:] 0.78 sec
  • [2021-02-27T11:56:16.973Z] 1071/2194 Test #980: tutorial-v7-ntuple-ntpl005_introspection ..........................................................***Failed Error regular expression found in output. Regex=[: error:] 0.85 sec
  • [2021-02-27T11:56:17.568Z] 1075/2194 Test #979: tutorial-v7-ntuple-ntpl004_dimuon .................................................................***Failed Error regular expression found in output. Regex=[: error:] 1.52 sec

Failing tests:

phsft-bot avatar Feb 27 '21 12:02 phsft-bot

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14 How to customize builds

phsft-bot avatar Feb 27 '21 13:02 phsft-bot

Thanks for the attention to it. Maybe I was too enthusiastic in the context of displayed objects and interactive root sessions. I'm not an expert, but still learning.

Th RDirectory looks interesting. Subroutines leaving global objects lying around still seems funny. It could be convenient for casual scripting. As an example of a main routine, and a subroutine that "makes" a TCanvas that persists after a .x, I'd be inclined to skin the cat this way:

// my_subroutines.C
void my_subroutine(unique_ptr<TCanvas> &x) {
    x.reset() //   TCanvas requires the extra step, see note below.
    x.reset(new TCanvas());
    // .. do something with it.
}
//make_canvas.C
namespace root_global { 
    unique_ptr<TCanvas> c1;
}

void make_canvas() {
// we decide at the top level to use a global here:
    my_subroutine(root_global::c1);

//This one doesn't use a global and will die, our choice
    unique_ptr<TCanvas> local_canvas;
    my_subroutine(local_canvas);
}

or if being a little more lazy, I'd likely use static instead of the namespace, but the root interpreter doesn't respect translation unit boundaries well.

Really I'd use a class though in case I need to add more global things easily:

//display.h

class display {
private:
    unique_ptr<TCanvas> c1;
public:
    void make_canvas() {
        c1.reset(); 
        c1.reset(new TCanvas());
    }
};

And use it like:

//make_display.C
#include "display.h"

namespace root_global { 
    display my_display;
}

// main routine:
void make_display() { 
//  This one will stick around
    root_global::my_display.make_canvas();

//  This one wont.  
    display local_display;
    local_display.make_canvas();
}

There are two main differences compared to the PR. 1) I'm not letting the subroutine allocate heap objects that aren't handled explicitly by the caller. 2) I'm using standard C++ that even works fine right now in root 6. That's nice because it's easier to learn, recognize, understand, and apply elsewhere. Maybe it's a bonus that syntax for creating the TCanvas is always the same and separated from syntax for keeping it. Maybe (1) can be done with RDirectory, and just isn't done in the PR example. I don't know enough about it. If the root objects all had some kind of empty non-displayed initial state, and their own reset() re-initializer, we wouldn't even need the unique_ptr.

Of course you could still end up calling make_display() as a subtroutine from another file, and if you do then it's still become a subroutine allocating "its own" heap objects. I just wouldn't, but with a little extra work for the usual extern declarations you could still access the memory elsewhere if you did.

DougLeonard avatar Mar 01 '21 04:03 DougLeonard

Another minor related point:

-   TH1F *test = new TH1F("test","This is test histogram",100,-4,4);
+  auto test = RDirectory::Heap().Create<TH1F>("test", "test","This is test histogram",100,-4,4);

I guess this gives us what, a TH1F*? (auto doesn't add clarity) Isn't it better to use the base class?

TH1 *test = RDirectory::Heap().Create<TH1F>("test", "test","This is test histogram",100,-4,4);

assuming that works, or if not, this does:

unique_ptr<TH1> test; 
void the_routine () {
    //...
    test.reset(new TH1F());
}

This is something that I find not well documented in the past. The old docs point out that TH1F is derived from TH1, but what wasn't always clear to this non-expert was that after creation it seems you can cast to TH1 without loss of any functionality, and functions taking TH1 pointers have the advantage of working on all TH1-derivatives. Building this concept into the standard examples could be helpful.

DougLeonard avatar Mar 02 '21 23:03 DougLeonard

TCanvas is not fully compatible with unique_ptr reset in root 6 though. If you reset it to a TCanvas of the same name, garbage collection causes it to be double deleted. The solution is to first do an empty reset. I will edit the code snippet above.

DougLeonard avatar Mar 03 '21 07:03 DougLeonard

Thanks for your comments!

See https://root.cern.ch/doc/master/classROOT_1_1Experimental_1_1RDirectory.html#a2b7aa4839c1a3bdd44cf80f72aa3cf51 for RDirectory::Create: it returns a shared_ptr. I don't understand your argument:

Subroutines leaving global objects lying around still seems funny.

versus your

unique_ptr<TH1> test;

or similar - global or namespace-scope objects of static storage duration. I do not see the advantage. Can you explain?

I also don't understand

I'm not letting the subroutine allocate heap objects that aren't handled explicitly by the caller.

What does "handled explicitly by the caller" mean?

I'm using standard C++ that even works fine right now in root 6.

Well, at the cost of having to code multiple lines. Of course one can always write everything oneself, but the beauty of a library is that it can provide core functionality that's easy to pick up.

Next thing I don't understand:

If the root objects all had some kind of empty non-displayed initial state, and their own reset() re-initializer, we wouldn't even need the unique_ptr.

How does that solve the storage duration? Suppose I run script1.C and script2.C, both create histograms, and I want to compare them. What do I do in your hypothetical "some kind of empty non-displayed initial state" case?

I'd likely use static instead of the namespace, but the root interpreter doesn't respect translation unit boundaries well.

For cling, input is growing the same translation unit. That's how

root [0] int i = 23l
root [1] int j = ++i;

works, for instance.

You example creates a global, just as Heap::Create does. Only yours needs some coding - for every global I need to find back I had to code

namespace root_global { 
    display my_display;
}

anew. That doesn't scale well IMO.

Axel-Naumann avatar Mar 03 '21 19:03 Axel-Naumann

Thanks for your comments!

Thanks for the discussion. I think many students learn C++ from root examples so it matters.

Thanks for the link to the docs. I got a little confused about Heap().Create details when I looked at first (and there you go) but I get it now. I learned two things:

a) RDirectory can take an existing shared_ptr using Add() instead of Create(), super! b) Little thing but it looks like Create() syntax cannot cast a TH1F to a TH1. I've learned to always want a TH1 and never a TH1F, as their interfaces are identical after construction (nothing sliced). Implicit casts make it not a huge issue but I'd still prefer to cast to a TH1 smart pointer first and then use Add(), a trivial nuisance. Maybe Create has a way to cast that I'm missing, or could.

Now, There's a lot to unpack here. Let's just stick with my pure subroutine version (not class version) for direct comparison for now. I could pass a structure anyway obviously. Now I can do this, with a TH1F...

example 1:

//my_subtrounes.C

// my_subroutines.C
void my_subroutine(shared_ptr<TH1> &x) {
    x.reset();                               // not needed for TH1F, but safer in general.
    x.reset(new TH1F());
    // .. do something with it.
}
//make_th1.C
#include <ROOT/RDirectory.hxx>                                                    //  one extra line ;)
#ifdef  __CINT__
// shared_ptr<TH1> mc_global_x,mc_global_y,mc_global_z;          // but saved one line here...
#endif

void make_th1() {
    shared_ptr<TH1> x,y,z;
    my_subroutine(x);
    my_subroutine(y);
    my_subroutine(z);

// and the last we do is leave globals on the heap for interactive user if needed:
#ifdef __CINT__                                         // I wouldn't leave the globals in compiled code.          
    RDirectory::Heap().Add("test1",x);             // this line is "extra" compared to PR, but puts caller in control.
    RDirectory::Heap().Add("test2",y);            // but using RDirectory requires 3 lines now ;)
    RDirectory::Heap().Add("test3",z);           // 
#endif
}

I'm ok with this. I like vanilla better, but I got a clean ownership model up until the end here, and made the globals right before exiting, almost like a return statement. If I want to put this in a compiled batch code that loops over this 50 times, pauses 1 second, and displays 3 things each time, I'd just add a loop with sleeps, and Bob's your uncle without even modifying the underlying implementation to get three instances. But this RDirectory version shown here and the C++ global version here are certainly not different in that respect

The C++ globals actually win on line count here if you like this code structure better (I do). I'd argue if one needs the extern header declarations to pass things around in compiled code, he should ask if he's doing it wrong, and that's not an usual sentiment. The whole point is I think to make things visible from the interpreter or other things spawned from it. That's surely why we got here, and if C++ globals makes it tedious to access them everywhere in compiled code, I call that a bonus.

On the other hand I don't like this
example2:

{
// my_subroutines.C
unique_ptr<TH1> x  // used in my_subroutine

void my_subroutine() {
    x.reset();                                            
    x.reset(new TH1F());
    // .. do something with it.
}

except for maybe quick temporary scripts.

So we have about 2.5 issues.

  1. The global command,
  2. the interface and ownership model, and 2.5) with the way it's done in the PR also the non-castable syntax issue.

Issue 2.5 is already covered.

I do still like C++ globals better even in example 2. It separates how to make a TH1 from how to make it global. The RDirectory is more self-documenting than "new", but C++ one makes the file self documenting, not just the line of code. And, all else equal, vanilla is better. But by far the more important point is that I don't like example 2 or the PR.

Regarding point 2, example 1 is surely more scalable. I already scaled it more than is possible with the PR. Of course you can edit the PR to make new key names on each call so that the RDirectory call can create more things. But scalable means I did it without needing to modify the underlying implementation or interface (which someone else might maintain) to get three plots instead of one.

Furthermore, coming back to issue 1, if you do edit the PR to have flexible (sequential or passed) RDirectory names, you also (edit: might) need to add explicit deletes (more lines of code) somewhere if you run it all in a batch loop, or you'll have a leak.(edit: depending on exactly how the looping and naming is done, but it can easily happen) In other words a portion of the new/delete problem has just been hidden under new with a different name. This is exactly what I mean by modern ownership models. In example 1 the top level code asked for memory that it owns. In my class version it made an instance that it owns. Objects are scoped only as far as requested. When that scope dies, or the container is reset, it's cleaned up where it is made. That's what smart pointers are all about. The PR as-is won't leak, but only because it's not flexible enough to create different things. It's a fragile grammar and in the end I'd rather worry about adding top-level "keep" statements than top level "deletes".

In the C++ global version of example 2, to get the serial loop over 3 parallel histograms, you'd need the global to be a vector and you'd have to push_back() onto it, but you're in the same (bad) place then. I wrote a script just like that this week exactly to fit a number of peaks determined by a file read inside one routine, and looped over outside, and I used example 1, not example 2, and future use of the base routine will never cause leaks.

The code structure is problematic to me. RDirectory itself isn't terrible, but I just don't see what it solves other than more syntax to remember.

A final point, the shared pointers supposedly perform worse than unique pointers. I don't know how much it matters, but when you're fitting millions of waveforms maybe it does. Of course then you shouldn't be using RDirectory, but you can still use my_subroutines.C to do your fit if you're looping the million without RDirectory or to examining one of those fits with RDirectory, without rewriting the routine.

I'm going to skip the point about empty root objects just to limit the scope for now.

How to skip globals entirely

Of course you can just let the top level code return an object or structure. The downside is just that you have to .L the code and call it with auto ret=make_th1(). It can't be a complied main(), but that's no big deal. It violates the spirit of "scripting" but is of course very clean:

example 3:

//subroutine.C

// my_subroutines.C
void my_subroutine(unique_ptr<TH1> &x) {
    x.reset();                               // not needed for TH1F, but safer in general.
    x.reset(new TH1F());
    // .. do something with it.
}
//make_th1.C

vector<unique_ptr<TH1>> make_th1() {     //edit: changed to TH1's here
    vector<unique_ptr<TH1>> retvec(3);     // and here, as intended.
        for (int i=0; i<3; i++) {
            my_subroutine(retvec[i]);
    }
    return retvec;
}

But the end-user can decide. Again, the subroutine works regardless.

The PR is an improvement, and I don't object to it. I think more improvement is possible.

(minor edits made for cleanup)

DougLeonard avatar Mar 05 '21 02:03 DougLeonard

As a bonus I think my code is more potentially thread safe. It's not something I'm real expert on, but it looks pretty thread safe. It's dealing with whatever it's handed, not overwriting the same global.

DougLeonard avatar Mar 05 '21 04:03 DougLeonard

or you'll have a leak.(edit: depending on exactly how the looping and naming is done, but it can easily happen) In other words a portion of the new/delete problem has just been hidden under new with a different name.

The intent is the opposite. The RDirectory's internal representation and what it returns is a kind of shared_ptr. The user register the same name twice then if they hold a 'reference' to it, they keep and if they don't it is deleted.

pcanal avatar Mar 06 '21 00:03 pcanal

The intent is the opposite. The RDirectory's internal representation and what it returns is a kind of shared_ptr. The user register the same name twice then if they hold a 'reference' to it, they keep and if they don't it is deleted.

I do appreciate that as written it works and prevents a leak. New and delete have also worked for 20 years of root, and when written correctly do prevent leaks and get the job done, so there's no crisis of course. This offers a little more protection in some usages, but maybe new things to understand in others. I think stack-based lifetime management is simple, intuitive, encourages top-down code design and configuration, and avoids side effects. The smart pointer is interesting. It wasn't obvious to me that create would behave like shared_ptr::reset(), in the sense of allowing copies to persist (rather than redirecting some pointers they contain to the the new thing).

I kind of see a way now that it's possible to get the scalability (produce variable N plots at a time) without editing the PR routine. From my caller I'd need to call the subroutine, then do an RDirectory::Find() to get a shared pointer which will preserve the created thing, then call the routine again to get a new one. And if I still want to keep the first one for the interactive prompt, I have to make it global again with a new RDirectory::Add() of the contained thing, with a new name. So that cat can be skinned, but it's a bit comical compared to just passing the pointer, and certainly less obvious.

DougLeonard avatar Mar 10 '21 07:03 DougLeonard