moment-timezone icon indicating copy to clipboard operation
moment-timezone copied to clipboard

Dramatic increase in package size

Open manishatcodal opened this issue 2 years ago • 15 comments

Moment-timezone version which you use:

Version: 0.5.36

Note: many issues are resolved if you just upgrade to the latest version

Issue description:

Package has increased size dramatically and AWS lambda layer rejecting the size. Seems it has gotten one additional folder in package - temp.back - which was not there in previous version.

image

manishatcodal avatar Aug 25 '22 13:08 manishatcodal

@ichernev ^

manishatcodal avatar Aug 25 '22 13:08 manishatcodal

oh, snap, npm publish is dumb, I'll release 0.5.37...

ichernev avatar Aug 25 '22 13:08 ichernev

0.5.37 released

ichernev avatar Aug 25 '22 13:08 ichernev

@ichernev Hmm. This is still an issue for me in 0.5.37. Here's the comparison between:

0.5.35: moment-timezone@0 5 35

0.5.37: moment-timezone@0 5 37

EvHaus avatar Aug 25 '22 17:08 EvHaus

This actually looks to me like fixing the bug reported in #768.

I suspect that the change in data file sizes is dependent on who is releasing the package. That is, it's based on which version of zdump is in use on the machine doing the release.

gilmoreorless avatar Aug 25 '22 23:08 gilmoreorless

I use Linux, if the issue is really related to that, we could release a few different versions with different year spans. I mean we have bundles (with code) for a few ranges, but the data is only one, which is sad.

I'm also not sure if having the actual zone data is not more efficient, than recording changes each year (for many many years in the past/future).

ichernev avatar Aug 26 '22 08:08 ichernev

I just checked -- on a MacOS zdump is 32bit, so it ends around 2047, and on Linux it can go up to 2437. I think the "correct" one is 2437, and then users should select a 10y span if they want less data. I think it makes sense to investigate ingesting direct tz data, which is rule based, so the future is not affected by data size. Also keep in mind that anything that far will likely still be wrong, because it's now popular for governments to mess with DST, so you never know if it won't get changed 3 times by then.

ichernev avatar Aug 27 '22 15:08 ichernev

Hey, what are these blah.tar.gz and curlxx.tar.gz files that have appeared in the package folder? They weren't there in 0.5.27

robcrocombe avatar Aug 31 '22 13:08 robcrocombe

do we have any updates on this? my package file is over 1000KB now

web-programmer-here avatar Sep 26 '22 17:09 web-programmer-here

Is there any progress on this?

zeljkomarinkovic avatar Nov 25 '22 08:11 zeljkomarinkovic

Hi everyone, I've recently joined the maintenance team for Moment Timezone. I'd like to take the time to explain what's going on with the package size and why "fixing it" isn't necessarily straightforward.

Context

It's not entirely possible to include the IANA time zone data in a moment-timezone npm package in a way that makes everyone happy. Firstly there's the variety of ways that projects consume this library:

  • Some people use the pre-built moment-timezone-with-data* files directly.
    • These can be downloaded from the momentjs.com website, or in a zip file from GitHub, or using npm and loading from the builds/ directory, or from a CDN like CDNJS or UNPKG.
  • Some people use npm and require/import the default index.js in a server-side project.
  • Some people use npm and require/import the default index.js in code that will later be bundled using something like Webpack or Rollup for use in a browser.
  • Some people use other methods entirely. The number of ways listed in the Using Moment documentation is testament to how many different loading systems there are (and there are more that aren't even listed).

Then there's the differing needs for the size and range of the data:

  • The IANA tzdb source can be compiled using either a legacy 32-bit format, or modern 64-bit format. The format that's used currently depends on the version of the zdump tool on the machine of the person building a Moment Timezone release (though I'm looking to change that soon).
    • The 64-bit format includes as much time zone data as possible, from the 19th Century through to the year 2400+. That can be over 900KB of data in packed.json.
    • The 32-bit format only covers from around 1901 to 2038, which is under 200KB in packed.json.
  • Along with the packed JSON file, each Moment Timezone release also contains 4 different pre-built files with different data ranges:
    • All possible data (either 32-bit or 64-bit range depending on the release version).
    • 1970-2030.
    • 2012-2022 (for backward compatilibity).
    • A rolling 10-year range that changes based on the year the release was built.
  • Every user of the library has different data needs, ranging from a single zone for a few years up to "as much data as possible".
  • Moment Timezone has fluctuated between the 32-bit data (all versions up to 0.5.21, plus 0.5.26 to 0.5.36) and 64-bit data (versions 0.5.22 to 0.5.25, and 0.5.37+).
    • When Moment Timezone was released with only the 32-bit data, there were complaints about bugs and incomplete data (#308, #451, #486, #798, #803). I also know that downstream projects that use this data (like js-joda) have had bugs reported about data ranges.
    • When this was corrected and releases were made with the 64-bit data, there were complaints about the downloaded package size and unnecessary data (#697, #999).

So reducing the data range fixes package size problems for some users, but causes bugs for other users.

Potential solution

I've been thinking about this problem a bit lately and have a different angle. Instead of reducing the range of the data, I suspect there are some great gains to be made by changing how the data is stored.

I spent a couple of hours playing with some ideas for further optimising the packed JSON data that makes up the majority of the package size. The best result I've got so far is 22% of the original JSON size (805KB -> 183KB) without any loss of data. There are still some more gains to be made, I reckon.

However, I haven't checked how much extra code would need to be added to Moment Timezone itself to handle the new data format, or the runtime performance cost. Changing the format would also technically be a breaking change, so it would need to be handled carefully. Especially since the project is in maintenance-only mode, and things like bundle size issues and major version bumps are documented as out of scope.


In summary: I do want to reduce the size of the npm package, but it's not as simple as going back to the way it was before.

gilmoreorless avatar Jan 05 '23 09:01 gilmoreorless

So that all makes sense, but I suppose the important thing to understand here is why has the size of the package suddenly exploded? Would it not make sense to back out whatever changes were made that caused this - and to implement the ability to fetch the 'newer bigger' dataset through a separate package?

kieranbenton avatar Jan 19 '23 15:01 kieranbenton

Summarising my earlier comment...

  • The (much) larger 64-bit data started being included from version 0.5.22. This fixed several reported problems with missing data. This larger dataset includes all possible values from the source tzdb.
  • Due to a glitch in the release process, versions 0.5.26 – 0.5.36 reverted back to the incomplete smaller data. This was a bug that was fixed in 0.5.37.
  • Since then, all releases have been using the full dataset again. Backing out these changes would mean going back to inaccurate data.
  • To be completely clear: The smaller, incomplete dataset was not the originally intended state (see https://github.com/moment/moment-timezone/issues/697#issuecomment-434416563). Including the full data was always meant to have been the case.

implement the ability to fetch the 'newer bigger' dataset through a separate package?

Honestly, I have considered having separate packages for different datasets. I got as far as writing a draft proposal for how it would work. But writing that made me realise how much work it would be (both for the initial setup and for ongoing maintenance), and how many more complications it could cause, which is not viable for the current state of the project.

I've also considered reducing the maximum year in the data to either 2100 or <year of release + 100>. Time zone rules change so often that predicting anything that far in advance is already likely to be wrong.

(EDIT: See #1039 for more detailed proposals)

gilmoreorless avatar Jan 20 '23 02:01 gilmoreorless

Thanks for taking the time to explain @gilmoreorless, I understand the position you're in now. Packing the data more tightly does seem like the only way 'out' without a complete rearchitecture of the project or taking a dependency on say Intl.Locale.prototype.timeZones which kind of takes the whole point away (as well as not being supported on FF still I think).

Like you said before it does seem highly compressible - even I can see that from the output of our vite build which shows our chunk containing moment-timezone be about 1mb but gzipped I think comes down to 300kb or so. Have you thought about using something like messagepack to do the dirty work for you? I don't think its runtime is very large and theres a non-zero chance a lot of projects will include it anyway.

kieranbenton avatar Jan 20 '23 09:01 kieranbenton

I've fleshed out some of the proposed changes around data size and breaking changes at #1039, for those interested.

gilmoreorless avatar Mar 01 '23 02:03 gilmoreorless