NeonTreeEvaluation Field measurements for trees

Hi Ben,

Thank you very much for making this benchmark dataset available! I think the dataset is sufficient for training and evaluating models for detecting and segmenting tree crowns. I just wonder if there is data of field measurements of trees which contains the tree metrics like Diameter at Breast Height, tree height, species, ... for individual annotated trees in this dataset ? These metrics would be beneficial for biomass and carbon stock estimation. The result of the segmentation process would be used to estimate these metrics of trees, so it would be nice to have ground truth data to evaluate this process.

Thank you for your time,

Jan 17 '25 03:01 SebastianKyle

Hi @SebastianKyle - yes, the field-collected stems data does have these measurements collected by NEON. The data for all of the field collected stems is in field.rda file in the data directory of the associated package repo. All of this information is also available directly from NEON.

I haven't looked at the alignment system in this or the linked repo in a while, so take a look and let us know if you have any issues linking up the pieces of info in field.rda in the way that you need them.

This data should also be available for the field annotated polygon crowns, from our collaborators who collected those crowns. I can put you in touch with them if you want to ask about getting access to that data.

Jan 23 '25 01:01 ethanwhite

Hi @ethanwhite,

Thank you for helping out. I did find out the Vegetation Structure data product while waiting for your response. And I downloaded them as .csv files for the sites that are available in the training tiles of NeonTree which the repo referenced.

However, the bounding box annotations in the training dataset are in .xml files and these files don't appear to contain the important information to link between field data and crown annotations which is the 'indvdID'. May there be other annotation files that can work this out ?

As far as I know, this property is the id for individual trees and if the annotation of each tree contains this property, we can use this to look up in the .csv file to obtain the field measurements of the tree. Just like in the case of IDTReeS 2020 Competition Data in which they provide the .csv field data files and their .shp annotation file contains the 'indvdID' for each tree.

And about the field annotated polygon crowns, I would be so glad if you can help me access the data. Segmentation would be more beneficial for estimating tree metrics than just with bounding boxes.

Thank you for your time,

Jan 23 '25 02:01 SebastianKyle

I may not following correctly here, but there is no explicit link between the stem locations from the vegetation structure data the bounding boxes. The are collected independently of one another and we have not created any direct linking fields because we don't actually know if bounding box drawn on imagery around a crown is associated with a particular stem.

We do certainly make assumptions about this in our work in order to create those linkages. See the Methods section in https://doi.org/10.1371/journal.pbio.3002700 for a recent example of our approach. So while it is possible to infer matches, and we do so, we don't think they belong in a benchmark dataset.

Regarding the IDTReeS 2020 dataset I'd have to dig into it:

If that information is for the polygon (not bounding box) data then those are collected simultaneously in the field. If it is the polygon data then that is the same polygon data as in this benchmark, so you might have what you need for the polygon data - if it's not let me know and I'll reach out to our collaborators.
If it's for bounding box data then we would have had to make some assumptions to generate the match which should be described in the competition paper. If you can't find a clear description there let me know and I'll run it down.

Jan 23 '25 14:01 ethanwhite

Well then I guess I'll just use the IDTreeS 2020 dataset for tree metrics estimation task. May you have a look at it, you can read the .shp annotation file (which contains bounding boxes and more) in the ITC folder. Then you may find the 'indvdID' property from the file for each tree and you can look up for that one in the field data file (.csv) for linkage. I visualized the field measurements for trees along with bounding boxes on the rgb images in this dataset. The 'height' property from .csv file seems to align quite well with the CHM images (if we look for the highest pixel within the region of the bounding box of the tree in CHM).

Regarding the segmentation polygon, I might want to look for the segmentation annotations training and evaluation in this dataset. It would be great to have ground truth polygons for training the deep learning model.

Jan 24 '25 08:01 SebastianKyle

Yes, we've found that field and remotely sensed height measures tend to be well correlated using the same approach (https://elifesciences.org/articles/62922).

Yes, ground truth polygons are great if you can get them. They are typically very limited because they are very expensive to collect. That's why there are a few, but not a lot, in this benchmark.

If there's anything else you need from us please let me know.

Jan 27 '25 17:01 ethanwhite

I have read the paper that proposed a deep learning method for tree detection on this Neon Dataset using Faster RCNN model (https://www.mdpi.com/2072-4292/11/11/1309). I also read the code in the DeepForest package and found out that the package load trained weights for the model from hugging face repo weecology/deepforest-tree. I tried to load the weight file from that repo to Faster RCNN model too but found some mismatch of the weight layers in the RPN and RoI Heads module. I also want to know which sites the model were evaluated on in the experiment process of the paper. I want to replicate the result that the proposed method in the paper achieved. So it would be great for me to be in touch with researchers who conduct the experiment for further discussion.

Thank you for your time.

Mar 09 '25 15:03 SebastianKyle

As I just mentioned over in https://github.com/weecology/DeepForest/issues/962, our primary model is a Retinanet, not a Faster RCNN, which is why you can't load the weights. You can see this described in the paper you link in section 2.3. Deep Learning RGB Detection.

Regarding the study site used in the 2019 paper it is SJER (see description in section 2.1. Study Site and Field Data).

It is worth noting that our current model is not the same one used in that initial 2019 paper since it has trained on more sites. For information on that version of the model please see the Methods section of the DeepForest software paper

Mar 10 '25 13:03 ethanwhite

Thank you for correcting me. From what I read in section 3.1 as you referred, the model was evaluated on 212 images containing 5,852 trees from 22 sites. I wonder if this dataset comes from the evaluation folder of this Data for the NeonTreeEvaluation Benchmark dataset. If it does, then those 212 images are the ones that have annotations with it right ? Because not all images in the evaluation folder of the Benchmark dataset have annotations. I hope my assumption are correct to then replicate the result of the paper.

Mar 11 '25 03:03 SebastianKyle

Yes, that's definitely a little confusing. Basically you want to just use the files in evaluations that have a matching file name in the annotations file. I think there are 194 of those with the difference from 212 having to do with some splitting of a few files we did in the original work. But just using those 194 matches should cover what was used in that paper plus a few additional annotations (looks like a little over 6000 annotations in total).

That said, since your goal is to replicate the results of the paper it's important to keep in mind that those labeled data are only the last fine tuning step that went into building the model. The first step is pretraining on over 30 million algorithmically generated crowns from the NEON LIDAR data. Because of the scale of this data we didn't archive the imagery (it's a significant part of NEON's RGB imagery catalog, so TB scale) and we streamed it into the model. It looks like we did hold on to a copy of the annotations we generated. It's a 40+ million row (> 2GB) csv file with image paths that provide information on year, site, and location, but you'd have to reassemble the imagery from NEON's data products. That pretraining stage will require quite a bit of storage and a fair bit of compute so I'd definitely recommend doing this on a cluster or cloud resources.

Mar 13 '25 23:03 ethanwhite

I wouldn't have enough computing resources to conduct such experiment. I wonder if using the trained weights from the RetinaNet model from hugging face repo weecology/deepforest-tree would be fine. In my project, I defined the RetinaNet model from pytorch library and then just load the trained weights to it and run evaluation on 190 images in the evaluation dataset (which contains .xml annotation files and CHM image or LiDAR point cloud).

The result I got was: Precision: 0.5213 Recall: 0.6230 F1 score: 0.5676 AP: 0.6231 Mean IoU: 0.7071 The config was: iou_threshold=0.5, conf_threshold=0.1, nms_threshold=0.2 The mean iou was calculated among the ground truth matched predictions. Precision and recall was computed for each evaluation image which then forms a precision-recall curve to compute average precision. Would this be applicable ?

Apr 06 '25 08:04 SebastianKyle

If I understand correctly you're basically evaluating the trained model against the evaluation data, which is basically what is done in the original benchmark paper but with slightly different threshold choices. So, yes, I think it's applicable, but primarily as a replication of the reported results with some additional metrics and some variation in threshold parameters.

Apr 14 '25 14:04 ethanwhite