openair icon indicating copy to clipboard operation
openair copied to clipboard

[trajcluster] Error in lower.to.upper.tri.inds(n) : 'n' must be >= 2

Open JaeyounRyoo opened this issue 4 years ago • 12 comments

I have an error through the trajCluster function. Error in lower.to.upper.tri.inds(n) : 'n' must be >= 2 I had installed the "openair" package of development version from github. I can understand the meaning of the 'n'. I had faced the error a few days ago but It had worked well after some doing (but I don't know what I did). Today I have met this error again. How can I do for this error?

I attached my data file. traj.nh3.csv.zip My data and function are here.

head(traj.nh3) date receptor year month day hour hour.inc lat lon height pressure date2 nh3 1 2019-05-04 1 2019 5 2019-05-04 11 -13 36.092 126.373 95.8 1004.6 2019-05-03 11:00:00 11.8858 2 2019-05-04 1 2019 5 2019-05-04 17 -55 37.518 114.880 484.8 950.3 2019-05-01 17:00:00 11.8858 3 2019-05-04 1 2019 5 2019-05-04 1 -23 36.486 124.956 108.0 1005.1 2019-05-03 01:00:00 11.8858 4 2019-05-04 1 2019 5 2019-05-04 7 -17 36.320 125.706 78.3 1007.9 2019-05-03 07:00:00 11.8858 5 2019-05-04 1 2019 5 2019-05-04 0 -24 36.555 124.918 124.0 1003.2 2019-05-03 00:00:00 11.8858

p.cluster <- trajCluster(traj.nh3 %>% filter(receptor == 1), method = "Angle", n.cluster = 6, type = "season", split.after = FALSE, layout = c(2, 2), plot = TRUE, col = c("blue", "green", "gold", "darkorange", "red", "brown"), map = TRUE, map.res = "default", map.cols = "transparent", map.alpha = 0.8, projection = "lambert", parameters = c(90, 0), orientation = c(90, 0, 0), xlim = c(90, 150), ylim = c(20, 60), grid.col = "transparent", par.settings = list(axis.line = list(lwd = 1.5), strip.border = list(lwd = 2), fontsize = list(text = 10)))

Error in lower.to.upper.tri.inds(n) : 'n' must be >= 2

JaeyounRyoo avatar Mar 07 '20 06:03 JaeyounRyoo

Hi, I have the same issue here. I imported trajectories with the exactly same fields as the ones you can get from importTraj, but I get the same Error in lower.to.upper.tri.inds(n) : 'n' must be >= 2 error.

Did you manage to solve it?

davide-p avatar Aug 20 '20 10:08 davide-p

Hi, I have the same issue here. I imported trajectories with the exactly same fields as the ones you can get from importTraj, but I get the same Error in lower.to.upper.tri.inds(n) : 'n' must be >= 2 error.

Did you manage to solve it?

I could understand your situation.. I didn't something special, just do close the session and restart the session. and there is a data.frame of trajectory results on the project. But i don't know how I solved the problem. Good Luck to you !!!

JaeyounRyoo avatar Aug 21 '20 04:08 JaeyounRyoo

Hi, I tried restarting the session several times, and nothing happens.

Here it is the head of my data.frame (not created with HYSPLIT, but formatted in the same way), and the piece of code:

> head(df)
  receptor year month day hour hour.inc   lat   lon   height pressure               date2       date
1        1 2018     3   1    0        0 46.37 11.79 2610.936      750 2018-03-01 00:00:00 2018-03-01
2        1 2018     2  28   23       -1 46.16 11.63 2599.040      751 2018-02-28 23:00:00 2018-03-01
3        1 2018     2  28   22       -2 45.98 11.53 2610.936      750 2018-02-28 22:00:00 2018-03-01
4        1 2018     2  28   21       -3 45.83 11.44 2658.711      746 2018-02-28 21:00:00 2018-03-01
5        1 2018     2  28   20       -4 45.75 11.33 2694.743      743 2018-02-28 20:00:00 2018-03-01
6        1 2018     2  28   19       -5 45.71 11.19 2755.182      738 2018-02-28 19:00:00 2018-03-01

> clust <- trajCluster(df, method = "Angle", n.cluster= 6, col = "Set2",
+                      map.cols = openColours("Paired", 10))
Error in lower.to.upper.tri.inds(n) : 'n' must be >= 2

davide-p avatar Aug 24 '20 10:08 davide-p

After some trials, it seems that the error appears when multiple input trajectories are characterized by the same date, but, e.g., different starting coordinates (i.e., I would like to cluster 5 trajectories starting at 750 hPa, but with different starting coordinates, for each specific date). trajCluster then fails when applying the calctraj function, because the trajectories are not grouped one after each other, but rather by the hour.inc values (therefore having all hour.inc = -120 lines first, then hour.inc = -119, and so on).

To overcome this issue, I first inserted a new column to the input data.frame, i.e., traj_num, which incrementally identifies each trajectory:

> head(df)
   receptor year month day hour hour.inc   lat   lon pressure               date2       date traj_num
1         1 2018     3   1    0        0 46.37 11.79      750 2018-03-01 00:00:00 2018-03-01        1
2         1 2018     2  28   23       -1 46.16 11.63      751 2018-02-28 23:00:00 2018-03-01        1
3         1 2018     2  28   22       -2 45.98 11.53      750 2018-02-28 22:00:00 2018-03-01        1
4         1 2018     2  28   21       -3 45.83 11.44      746 2018-02-28 21:00:00 2018-03-01        1
5         1 2018     2  28   20       -4 45.75 11.33      743 2018-02-28 20:00:00 2018-03-01        1
...
122        1 2018     3   1    0        0 45.87 11.79      750 2018-03-01 00:00:00 2018-03-01        2
123        1 2018     2  28   23       -1 45.71 11.55      745 2018-02-28 23:00:00 2018-03-01        2
124        1 2018     2  28   22       -2 45.52 11.35      747 2018-02-28 22:00:00 2018-03-01        2
...
1105939        1 2019     5  26   19     -119 32.78 -39.13      931 2019-05-26 19:00:00 2019-05-31 18:00:00     9140
1105940        1 2019     5  26   18     -120 32.49 -39.00      929 2019-05-26 18:00:00 2019-05-31 18:00:00     9140

and then I modified the following lines in trajCluster (the first two lines of calctraj):

traj <- traj[order(traj$date, traj$traj_num, traj$hour.inc), ]
traj <- group_by(traj, traj_num) %>% mutate(len=length(date))

so that the grouping is depending on the new traj_num variable, and not on date anymore. With this change, it then correctly groups and separates the trajectories, making the cluster computation possible

davide-p avatar Aug 26 '20 14:08 davide-p

traj <- group_by(traj, traj_num) %>% mutate(len=length(date))

I did not get your code, above code traj, and traj_num are variable in data frame or else.

Jaip2018 avatar Sep 18 '20 18:09 Jaip2018

Hi @Jaip2018, just have a look at the original trajCluster code. traj_num is a new column I created, which indicates the number of each trajectory in the entire traj input data.frame.

With the modification, it will group trajectories based on traj_num, and not on date anymore, so that the issue will be overcome (i.e., by grouping on date you will not have only one trajectory, but many).

davide-p avatar Sep 21 '20 07:09 davide-p

Thanks,

Then we can change in our dataframe.

Let me suggest following copy paste trajCluster code and modified two line in calctraj, and change file or not. suppose my file is PSCF_traj.

Jaip2018 avatar Sep 21 '20 17:09 Jaip2018

After some trials, it seems that the error appears when multiple input trajectories are characterized by the same date, but, e.g., different starting coordinates (i.e., I would like to cluster 5 trajectories starting at 750 hPa, but with different starting coordinates, for each specific date). trajCluster then fails when applying the calctraj function, because the trajectories are not grouped one after each other, but rather by the hour.inc values (therefore having all hour.inc = -120 lines first, then hour.inc = -119, and so on).

To overcome this issue, I first inserted a new column to the input data.frame, i.e., traj_num, which incrementally identifies each trajectory:

> head(df)
   receptor year month day hour hour.inc   lat   lon pressure               date2       date traj_num
1         1 2018     3   1    0        0 46.37 11.79      750 2018-03-01 00:00:00 2018-03-01        1
2         1 2018     2  28   23       -1 46.16 11.63      751 2018-02-28 23:00:00 2018-03-01        1
3         1 2018     2  28   22       -2 45.98 11.53      750 2018-02-28 22:00:00 2018-03-01        1
4         1 2018     2  28   21       -3 45.83 11.44      746 2018-02-28 21:00:00 2018-03-01        1
5         1 2018     2  28   20       -4 45.75 11.33      743 2018-02-28 20:00:00 2018-03-01        1
...
122        1 2018     3   1    0        0 45.87 11.79      750 2018-03-01 00:00:00 2018-03-01        2
123        1 2018     2  28   23       -1 45.71 11.55      745 2018-02-28 23:00:00 2018-03-01        2
124        1 2018     2  28   22       -2 45.52 11.35      747 2018-02-28 22:00:00 2018-03-01        2
...
1105939        1 2019     5  26   19     -119 32.78 -39.13      931 2019-05-26 19:00:00 2019-05-31 18:00:00     9140
1105940        1 2019     5  26   18     -120 32.49 -39.00      929 2019-05-26 18:00:00 2019-05-31 18:00:00     9140

and then I modified the following lines in trajCluster (the first two lines of calctraj):

traj <- traj[order(traj$date, traj$traj_num, traj$hour.inc), ]
traj <- group_by(traj, traj_num) %>% mutate(len=length(date))

so that the grouping is depending on the new traj_num variable, and not on date anymore. With this change, it then correctly groups and separates the trajectories, making the cluster computation possible

The modification of traj_num instead of date inside group_by must be performed also when filtering according to trajectory lengths (i.e., line 135 of "original" trajCluster.R function).

davide-p avatar Apr 12 '22 14:04 davide-p

Hi all

Apologies that I have not got back sooner on this. I had not realised I was getting issues raised because notifications were turned off. Just to understand what you are trying to do - are you wanting to pool back trajectories from several locations and carry out cluster analysis, rather than one site / receptor at a time? This might be useful and I can modify the code to do this if I understand the aim ...

Thanks David

davidcarslaw avatar Apr 13 '22 07:04 davidcarslaw

Hi David,

thank you for your reply. Yes, I want to cluster the trajectories which have their initial locations only slightly modified (e.g., lat shifted by 0.5° as in the example I reported above). This situation might happen, e.g., if I have a set of 5 trajectories shifted by +- XX° in lat/lon and/or height, but they have the same date field. It seems that the solution of grouping by trajectory number (an incremental new column created, which identifies each trajectory) instead of date should fix this issue

Thanks

davide-p avatar Apr 13 '22 12:04 davide-p

OK, thanks. If they are only slightly shifted, could you not just use a single receptor location? The met data used is quite coarse, so I would say that if the sites were within ~50 km of one another you might as well use the same stating location.

I can see this perhaps being more useful for source contributions (statistic is CWT or PSCF) where linking several sites perhaps 100s km apart to pool source contributions for a pollutant would be useful. At the moment, that is not possible in Openair, but something I am going to take a look at.

davidcarslaw avatar Apr 13 '22 13:04 davidcarslaw

That would definitely be interesting! Really looking forward to it, since I am using PSCF too.

By the way, I am often using the trajCluster function for back-trajectories not calculated with the other openair functions (that I then shaped so that they could be ingested by trajCluster), that's why I needed it in the past (that also works for different starting heights, let's say shifted by 50 or 100 hPa).

davide-p avatar Apr 13 '22 13:04 davide-p