openair
openair copied to clipboard
[trajcluster] Error in lower.to.upper.tri.inds(n) : 'n' must be >= 2
I have an error through the trajCluster function. Error in lower.to.upper.tri.inds(n) : 'n' must be >= 2 I had installed the "openair" package of development version from github. I can understand the meaning of the 'n'. I had faced the error a few days ago but It had worked well after some doing (but I don't know what I did). Today I have met this error again. How can I do for this error?
I attached my data file. traj.nh3.csv.zip My data and function are here.
head(traj.nh3) date receptor year month day hour hour.inc lat lon height pressure date2 nh3 1 2019-05-04 1 2019 5 2019-05-04 11 -13 36.092 126.373 95.8 1004.6 2019-05-03 11:00:00 11.8858 2 2019-05-04 1 2019 5 2019-05-04 17 -55 37.518 114.880 484.8 950.3 2019-05-01 17:00:00 11.8858 3 2019-05-04 1 2019 5 2019-05-04 1 -23 36.486 124.956 108.0 1005.1 2019-05-03 01:00:00 11.8858 4 2019-05-04 1 2019 5 2019-05-04 7 -17 36.320 125.706 78.3 1007.9 2019-05-03 07:00:00 11.8858 5 2019-05-04 1 2019 5 2019-05-04 0 -24 36.555 124.918 124.0 1003.2 2019-05-03 00:00:00 11.8858
p.cluster <- trajCluster(traj.nh3 %>% filter(receptor == 1), method = "Angle", n.cluster = 6, type = "season", split.after = FALSE, layout = c(2, 2), plot = TRUE, col = c("blue", "green", "gold", "darkorange", "red", "brown"), map = TRUE, map.res = "default", map.cols = "transparent", map.alpha = 0.8, projection = "lambert", parameters = c(90, 0), orientation = c(90, 0, 0), xlim = c(90, 150), ylim = c(20, 60), grid.col = "transparent", par.settings = list(axis.line = list(lwd = 1.5), strip.border = list(lwd = 2), fontsize = list(text = 10)))
Error in lower.to.upper.tri.inds(n) : 'n' must be >= 2
Hi, I have the same issue here. I imported trajectories with the exactly same fields as the ones you can get from importTraj
, but I get the same Error in lower.to.upper.tri.inds(n) : 'n' must be >= 2
error.
Did you manage to solve it?
Hi, I have the same issue here. I imported trajectories with the exactly same fields as the ones you can get from
importTraj
, but I get the sameError in lower.to.upper.tri.inds(n) : 'n' must be >= 2
error.Did you manage to solve it?
I could understand your situation.. I didn't something special, just do close the session and restart the session. and there is a data.frame of trajectory results on the project. But i don't know how I solved the problem. Good Luck to you !!!
Hi, I tried restarting the session several times, and nothing happens.
Here it is the head of my data.frame (not created with HYSPLIT, but formatted in the same way), and the piece of code:
> head(df)
receptor year month day hour hour.inc lat lon height pressure date2 date
1 1 2018 3 1 0 0 46.37 11.79 2610.936 750 2018-03-01 00:00:00 2018-03-01
2 1 2018 2 28 23 -1 46.16 11.63 2599.040 751 2018-02-28 23:00:00 2018-03-01
3 1 2018 2 28 22 -2 45.98 11.53 2610.936 750 2018-02-28 22:00:00 2018-03-01
4 1 2018 2 28 21 -3 45.83 11.44 2658.711 746 2018-02-28 21:00:00 2018-03-01
5 1 2018 2 28 20 -4 45.75 11.33 2694.743 743 2018-02-28 20:00:00 2018-03-01
6 1 2018 2 28 19 -5 45.71 11.19 2755.182 738 2018-02-28 19:00:00 2018-03-01
> clust <- trajCluster(df, method = "Angle", n.cluster= 6, col = "Set2",
+ map.cols = openColours("Paired", 10))
Error in lower.to.upper.tri.inds(n) : 'n' must be >= 2
After some trials, it seems that the error appears when multiple input trajectories are characterized by the same date
, but, e.g., different starting coordinates (i.e., I would like to cluster 5 trajectories starting at 750 hPa, but with different starting coordinates, for each specific date).
trajCluster
then fails when applying the calctraj
function, because the trajectories are not grouped one after each other, but rather by the hour.inc
values (therefore having all hour.inc = -120
lines first, then hour.inc = -119
, and so on).
To overcome this issue, I first inserted a new column to the input data.frame, i.e., traj_num
, which incrementally identifies each trajectory:
> head(df)
receptor year month day hour hour.inc lat lon pressure date2 date traj_num
1 1 2018 3 1 0 0 46.37 11.79 750 2018-03-01 00:00:00 2018-03-01 1
2 1 2018 2 28 23 -1 46.16 11.63 751 2018-02-28 23:00:00 2018-03-01 1
3 1 2018 2 28 22 -2 45.98 11.53 750 2018-02-28 22:00:00 2018-03-01 1
4 1 2018 2 28 21 -3 45.83 11.44 746 2018-02-28 21:00:00 2018-03-01 1
5 1 2018 2 28 20 -4 45.75 11.33 743 2018-02-28 20:00:00 2018-03-01 1
...
122 1 2018 3 1 0 0 45.87 11.79 750 2018-03-01 00:00:00 2018-03-01 2
123 1 2018 2 28 23 -1 45.71 11.55 745 2018-02-28 23:00:00 2018-03-01 2
124 1 2018 2 28 22 -2 45.52 11.35 747 2018-02-28 22:00:00 2018-03-01 2
...
1105939 1 2019 5 26 19 -119 32.78 -39.13 931 2019-05-26 19:00:00 2019-05-31 18:00:00 9140
1105940 1 2019 5 26 18 -120 32.49 -39.00 929 2019-05-26 18:00:00 2019-05-31 18:00:00 9140
and then I modified the following lines in trajCluster
(the first two lines of calctraj
):
traj <- traj[order(traj$date, traj$traj_num, traj$hour.inc), ]
traj <- group_by(traj, traj_num) %>% mutate(len=length(date))
so that the grouping is depending on the new traj_num
variable, and not on date
anymore. With this change, it then correctly groups and separates the trajectories, making the cluster computation possible
traj <- group_by(traj, traj_num) %>% mutate(len=length(date))
I did not get your code, above code traj, and traj_num are variable in data frame or else.
Hi @Jaip2018, just have a look at the original trajCluster
code. traj_num
is a new column I created, which indicates the number of each trajectory in the entire traj
input data.frame.
With the modification, it will group trajectories based on traj_num
, and not on date
anymore, so that the issue will be overcome (i.e., by grouping on date
you will not have only one trajectory, but many).
Thanks,
Then we can change in our dataframe.
Let me suggest following copy paste trajCluster code and modified two line in calctraj, and change file or not. suppose my file is PSCF_traj.
After some trials, it seems that the error appears when multiple input trajectories are characterized by the same
date
, but, e.g., different starting coordinates (i.e., I would like to cluster 5 trajectories starting at 750 hPa, but with different starting coordinates, for each specific date).trajCluster
then fails when applying thecalctraj
function, because the trajectories are not grouped one after each other, but rather by thehour.inc
values (therefore having allhour.inc = -120
lines first, thenhour.inc = -119
, and so on).To overcome this issue, I first inserted a new column to the input data.frame, i.e.,
traj_num
, which incrementally identifies each trajectory:> head(df) receptor year month day hour hour.inc lat lon pressure date2 date traj_num 1 1 2018 3 1 0 0 46.37 11.79 750 2018-03-01 00:00:00 2018-03-01 1 2 1 2018 2 28 23 -1 46.16 11.63 751 2018-02-28 23:00:00 2018-03-01 1 3 1 2018 2 28 22 -2 45.98 11.53 750 2018-02-28 22:00:00 2018-03-01 1 4 1 2018 2 28 21 -3 45.83 11.44 746 2018-02-28 21:00:00 2018-03-01 1 5 1 2018 2 28 20 -4 45.75 11.33 743 2018-02-28 20:00:00 2018-03-01 1 ... 122 1 2018 3 1 0 0 45.87 11.79 750 2018-03-01 00:00:00 2018-03-01 2 123 1 2018 2 28 23 -1 45.71 11.55 745 2018-02-28 23:00:00 2018-03-01 2 124 1 2018 2 28 22 -2 45.52 11.35 747 2018-02-28 22:00:00 2018-03-01 2 ... 1105939 1 2019 5 26 19 -119 32.78 -39.13 931 2019-05-26 19:00:00 2019-05-31 18:00:00 9140 1105940 1 2019 5 26 18 -120 32.49 -39.00 929 2019-05-26 18:00:00 2019-05-31 18:00:00 9140
and then I modified the following lines in
trajCluster
(the first two lines ofcalctraj
):traj <- traj[order(traj$date, traj$traj_num, traj$hour.inc), ] traj <- group_by(traj, traj_num) %>% mutate(len=length(date))
so that the grouping is depending on the new
traj_num
variable, and not ondate
anymore. With this change, it then correctly groups and separates the trajectories, making the cluster computation possible
The modification of traj_num
instead of date
inside group_by
must be performed also when filtering according to trajectory lengths (i.e., line 135 of "original" trajCluster.R function).
Hi all
Apologies that I have not got back sooner on this. I had not realised I was getting issues raised because notifications were turned off. Just to understand what you are trying to do - are you wanting to pool back trajectories from several locations and carry out cluster analysis, rather than one site / receptor at a time? This might be useful and I can modify the code to do this if I understand the aim ...
Thanks David
Hi David,
thank you for your reply. Yes, I want to cluster the trajectories which have their initial locations only slightly modified (e.g., lat shifted by 0.5° as in the example I reported above). This situation might happen, e.g., if I have a set of 5 trajectories shifted by +- XX° in lat/lon and/or height, but they have the same date field. It seems that the solution of grouping by trajectory number (an incremental new column created, which identifies each trajectory) instead of date should fix this issue
Thanks
OK, thanks. If they are only slightly shifted, could you not just use a single receptor location? The met data used is quite coarse, so I would say that if the sites were within ~50 km of one another you might as well use the same stating location.
I can see this perhaps being more useful for source contributions (statistic is CWT or PSCF) where linking several sites perhaps 100s km apart to pool source contributions for a pollutant would be useful. At the moment, that is not possible in Openair, but something I am going to take a look at.
That would definitely be interesting! Really looking forward to it, since I am using PSCF too.
By the way, I am often using the trajCluster function for back-trajectories not calculated with the other openair functions (that I then shaped so that they could be ingested by trajCluster), that's why I needed it in the past (that also works for different starting heights, let's say shifted by 50 or 100 hPa).