tobac
tobac copied to clipboard
Explanations of parameters
Here're some explanations of basic parameters. Most are copied from the codes.
My questions and unfamiliar parts are in bold.
Feature identification
To identify features use tobac.themes.tobac_v1.feature_detection_multithreshold
method.
Tobac allows getting features by checking data above/below one threshold or multiple thresholds (recommended).
-
target
Flag to determine if tracking is targeting minima or maxima in the data. Default is 'maximum'.
-
position_threshold
It sets the method to determine the position from the region. Default is 'center'.
Four options are available:
center
,extreme
,weighted_diff
orweighted_abs
.-
center
geometrical center of identified region
-
extreme
max/min position inside the identified region
-
weighted_diff
center of identified region, weighted by difference from the threshold
-
weighted_abs
center of identified region, weighted by absolute values of the field
-
-
sigma_threshold
Standard deviation for intial filtering step. Default is 0.5
I'm not quite sure of the definition above ...
Here's my opinion: That is how many pixels are used to apply a Gaussian filter. How could 0.5 is used to smooth the data. Shouldn't it larger than 1 at least?
-
n_erosion_threshold
Number of pixel by which to erode the identified features. Default is 0 which means keep the mask as clear as it is.
-
n_min_threshold
Minimum number of identified features. Default is 0.
If the number of pixel in masked region is less than n_min_threshold, the region is deleted.
-
min_distance
Minimum distance between detected features. Default is 0.
Remove features that are closer together than min_distance and keep larger one (higher threshold). if threshold is same, keep the larger area.
This is an example to track the minimum TBB:
parameters_features = {}
parameters_features['target'] = 'minimum'
parameters_features['position_threshold'] = 'weighted_diff'
parameters_features['sigma_threshold'] = 0.5
parameters_features['threshold'] = np.arange(280,190,-10)
parameters_features['min_distance'] = 0
parameters_features['n_erosion_threshold'] = 0
dxy = 5000 # unit: m
Features=tobac.themes.tobac_v1.feature_detection_multithreshold(TBB, dxy, **parameters_features)
Segmentation
-
threshold
Threshold for the watershedding field to be used for the mask. Default is 3e-3.
The algorithm fills the area (2-D) or volume (3-D) based on the input field starting from the weighted mean centers until reaching the threshold.
-
target
Flag to determine if tracking is targeting minima or maxima in the data. Default is 'maximum'.
This can be different from the target in parameters_features.
-
level
Levels at which to seed the cells for the watershedding algorithm. Default is None.
I'm not familiar with iris. So not sure the meaning of level
-
max_distance
Maximum distance from a marker allowed to be classified as belonging to that cell. Default is None. Unit is m which is as same as dxy
Continue the example:
parameters_segmentation={}
parameters_segmentation['target']='minimum'
parameters_segmentation['method']='watershed'
parameters_segmentation['threshold'] = 280
Mask_TBB, Features_TBB = tobac.themes.tobac_v1.segmentation(Features, TBB, dxy, **parameters_segmentation)
Linking
-
dt
Time resolution of tracked features. Unit: s
-
d_max
Maximum search range Default is None.
-
d_min Variations in the shape of the regions used to determine the positions of the features can lead to quasi-instantaneous shifts of the position of the feature by one or two grid cells even for a very high temporal resolution of the input data, potentially jeopardising the tracking procedure. To prevent this, tobac uses an additional minimum radius of the search range. Default is None. Unit: m
-
v_max Speed at which features are allowed to move. Default is None. Unit: m/s
-
memory Number of output timesteps features allowed to vanish for to be still considered tracked. Default is 0. .. warning :: This parameter should be used with caution, as it can lead to erroneous trajectory linking, especially for data with low time resolution.
-
stubs Minimum number of timesteps of a tracked cell to be reported. Default is 1.
-
time_cell_min Minimum length in time of tracked cell to be reported in minutes. Default is None.
-
extrapolate Number or timesteps to extrapolate trajectories. Default is 0.
This allows for the inclusion of both the initiation of the cell and the decaying later stages in the analysis of the cloud life cycle.
-
method_linking {'random', 'predict'} Flag choosing method used for trajectory linking. Default is 'random'.
predict is useful for cloud and fluid tracking.
-
adaptive_step Reduce search range by multiplying it by this factor.
-
adaptive_stop If not None, when encountering an oversize subnet, retry by progressively reducing search_range until the subnet is solvable.
If search_range becomes <= adaptive_stop, give up and raise a SubnetOversizeException. Default is None
-
cell_number_start Cell number for first tracked cell. Default is 1.
Example:
dt = 600 # unit: s
parameters_linking={}
parameters_linking['v_max'] = 20 # 20*600 = 12 km
parameters_linking['stubs'] = 3 # keeps only trajectories that last for 3 frames.
parameters_linking['order'] = 1
parameters_linking['extrapolate'] = 0
parameters_linking['memory'] = 0
parameters_linking['adaptive_stop'] = 0.2
parameters_linking['adaptive_step'] = 0.95
parameters_linking['subnetwork_size'] = 100
parameters_linking['d_min'] = 2*dxy
parameters_linking['method_linking']= 'predict'
Result
It looks good for tracking deep convections using 10 min TBB data:
I suppose we can use the diagrams in @mheikenfeld 's GMD paper to explain these parameters, like @w-k-jones has done for the Feature detection section.
Thanks for your comments!
Explanation of units should be part of the documentation, for sure! This needs to be integrated into the respective rst files.
Gaussian Filter: The filter width just determines how fast the weights for the convolution decay. sigma < 1 is therefore no problem.
@zxdawn a +1 of thanks from me, as I needed this just now. Also, I would say the docstring for n_min_threshold
is misleading, as it refers to features (tracked regions?) and not pixels, and the docstring for n_min
is either absent or unclear. What is n_min
counting? Is it tracked regions, or something else?
@deeplycloudy You're welcome. Glad it's useful ;)
Which n_min
do you mean? Could you copy the permalink here?
Sorry, I meant min_num
, such as in feature_detection_multithreshold_timestep
Ha, it seems that's not used at all. The only appearance is here. But, that's commented. That should be the minimum number of features at one timestamp.
BTW, n_min_threshold
is the num of pixels in each mask of features. See here if I understand correctly.
Ah, maybe a candidate for removal in v. 2.0, then! Or at least deprecation. And thanks for the clarification on n_min_threshold
- I agree with your interpretation.
I suppose if there're n_1 features at t_1 and n_2 features at t_2, and the n_min is between n_1 and n_2, then that may cause the missing features at t_2, so the tracking won't work correctly.
Looking at the default heuristic for search_range
, the units for v_max
are (grid spacing units)/(time step units)
@deeplycloudy Yes, you're right. Because we usually use m
for grid spacing units and s
for time step units, the unit of v_max
is m/s. See the get_spacings
in utils. If you use other units, it should also work. Feel free to test ;)
I believe this is now being resolved in v1.x with #138 et al. I also think this has been resolved in v2.x, but correct me if I'm wrong. I'm inclined to close this when #138 is merged in, if you're happy with that @zxdawn ?
@freemansw1 It's fine to me except for the clarification of the unit. Leave the review on that PR now ;)
Now that #138 is in, I'm going to close this issue. There is still room for improvement on the docs, but I think we have addressed this specific issue.