timeshape icon indicating copy to clipboard operation
timeshape copied to clipboard

Initialize library filtering by Continent/Country

Open juarezr opened this issue 5 years ago • 6 comments

In timeshape today, there are two methods for initializing the library

  • with the data for the whole world
  • with some bounding box only, to reduce memory usage

It would be interesting to have a third option for initializing with the data of a continent or a country because the main scopes of use are political following the patterns:

  • whole world (international usage)
  • my country (USA, Brazil, etc...)
  • my region (European Union, South America, Asia, etc...)

This division could be made according to IANA existing divisions.

juarezr avatar Dec 11 '18 14:12 juarezr

I was thinking about something similar as well. These cases, while sounding somewhat similar, have quite different implementation complexity, so let's take them one by one:

  1. Initialize for a given continent/region, like Europe, Asia, etc. This could be done relatively easily, if we could use prefix from the time zone id. E.g. given prefix Europe, time zones like Europe/Berlin or Europe/Stockholm will be included, but e.g. Asia/Tomsk will not.
  2. Initialize based on country. This is a slippery slope. Given the fact that there are a lot of disputed territories in the world (see e.g. https://github.com/RomanIakovlev/timeshape/issues/27), I'd rather avoid making any decision to which country any given timezone belongs. Besides that, it would require adding more dependencies to the Timeshape library, which I'd rather avoid, because I want to minimize usage of popular dependencies, and keep the library small and tidy.

But I think more ways to limit the loaded time zones would be a useful addition to the Timeshape, and therefore I have a couple of proposals. Firstly, we can introduce one more way to initialize, which would accept Set<java.time.ZoneId>. Only those time zones will be considered for search. The users of Timeshape will obtain the list of interesting time zones in whatever way they find appropriate, e.g. by using Time4j. Secondly, we can implement an option which accepts a regular expression. Only time zones, id's of which match this regex, will be considered for search. This is a generalization of point 1. mentioned above. Both of these options are relatively easy to implement. Do you think any of these will be useful?

RomanIakovlev avatar Dec 12 '18 15:12 RomanIakovlev

I am thinking that the above proposals are good enough and should cover the needs quite well.

  1. Using regular expressions are powerfull and flexible and a excelent choice for this case. I'm not sure, however, if some devs are comfortable using regexp. Maybe this will need more detailed documentation with some examples.
  2. Filtering by Set<java.time.ZoneId> should be equivalent to filtering by Set<String> of timezone names. Not sure is worth...
  3. I agree with you that it's not worth adding more dependencies.
  4. It seems to me that it's better not to invest so much time in this feature and redirect the efforts for other things, like library evolution, easy of maintanance, etc...

juarezr avatar Dec 14 '18 02:12 juarezr

Ok then, let's implement the regex option.

RomanIakovlev avatar Dec 14 '18 13:12 RomanIakovlev

Instead of limiting, could you provide a lazy loading by splitting into region-specific indexes? As in, if you know the bounding box for a continent then you can query determine which Index to delegate to. Then only the sub-indexes being used would be in-memory, reducing the footprint for development and localized application.

ben-manes avatar Dec 20 '18 04:12 ben-manes

@ben-manes right, I've been thinking about implementing some sort of lazy loading as well. I'm not sure what's the best way to implement this, but I think it might be a useful addition to the Timeshape, since, although the lazy loading will always incur some sort of performance penalty, for some use cases it might be negligible. Lazy loading might even be preferable, if initialization time will be cut down. E.g. if you use some FaaS (e.g. AWS Lambda), where you only need to query one position (of unpredictable origin), you shouldn't be paying the price of full initialization.

I guess, I'll just open another issue to keep track of and discuss the lazy loading.

RomanIakovlev avatar Dec 21 '18 13:12 RomanIakovlev

What should be the impact on size/performance when the shapes are simplified? There are any numbers on this?

juarezr avatar Dec 22 '18 01:12 juarezr

Closing due to lack of activity. If there's still interest in working on this, anyone should feel free to pick up this work, and I'll provide support with code reviews and releases.

RomanIakovlev avatar Nov 22 '22 15:11 RomanIakovlev