root icon indicating copy to clipboard operation
root copied to clipboard

[DF] RDF using TTree with non standard branch name ("branchName.") fails

Open martamaja10 opened this issue 1 year ago • 2 comments

Check duplicate issues.

  • [ ] Checked for duplicates

Description

If RDF is created from the TTree which has a branch that has a non standard name, such as a dot at the end, the errors appear while trying to use this branch to define a new column.

Reproducer

import ROOT

def write():
    with ROOT.TFile.Open("reproducer.root", "recreate") as f:
        tree = ROOT.TTree("reproducer", "reproducer")
        obj = ROOT.TObject()
        tree.Branch("s.", obj)
        tree.Fill()
        tree.Write()

def read():
    df = ROOT.ROOT.RDataFrame("reproducer", "reproducer.root")
    df = df.Alias("nots", "s.")
    #df = df.Define("uniqid", "s.fUniqueId")   # does not work: error: use of undeclared identifier 's' auto func0(){return s.fUniqueId
    #df = df.Define("uniqid", "nots.fUniqueId")  # does not work either, as Alias is resolved back to `s..fUniqueId`: error: use of undeclared identifier 's' auto func0(TObject& var0){return s..fUniqueId
    ```

### ROOT version

all

### Installation method

source

### Operating system

MacOS

### Additional context

_No response_

martamaja10 avatar Jan 22 '24 12:01 martamaja10

The lexical structure of formal languages involves characters that are not allowed in identifiers and are not whitespace, but that have some special lexical significance other than being literal characters (such as in string literals) or ignored (such as in comments).

Examples of characters with syntactic use include:

  • decimal marks in numeric literals
  • arithmetic operators, such as +, -, *, /
  • parentheses and other brackets
  • characters in comment delimiters, such as #, /*, --, or ⍝
  • quotation marks delimiting strings
  • characters such as \ introducing escape sequences

It is useful to bound the set of characters with syntactic use. This makes it possible to build tools that handle source code, but do not validate it, such as syntax highlighters, in a forward-compatible way.

The main question here is what to do when a string contains characters with syntactic use. To document it?

ianna avatar Feb 14 '24 12:02 ianna

A way I can see this fixed is through a "passthrough Define", which basically reads the content of a column and passes it to another one, without jitting, following the example above

TObject& f(TObject& o){return o}; 
[...]
auto df = df.Define("nots", f, {"s."});
[...]

If this is acceptable, it is required to a. Document this carefully b. Make sure a way to do the same is available from python

dpiparo avatar May 25 '24 19:05 dpiparo