learningtower icon indicating copy to clipboard operation
learningtower copied to clipboard

Package size is too large

Open kevinwang09 opened this issue 1 year ago • 1 comments

After adding the 2022 data, the package current exceeds 5MB. Resolutions:

  1. Re-curate the 2022 data
  2. Explore alternative data compression for the subset data creation. https://github.com/kevinwang09/learningtower/blob/master/inst/sampling.R

kevinwang09 avatar Oct 04 '24 03:10 kevinwang09

We can make the samples for each year smaller, to fix this.

On 4 Oct 2024, at 1:28 pm, kevinwang @.***> wrote:

After adding the 2022 data, the package current exceeds 5MB. Resolutions: • Re-curate the 2022 data • Explore alternative data compression for the subset data creation. https://github.com/kevinwang09/learningtower/blob/master/inst/sampling.R — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

cheers, Di


Dianne Cook @.***

dicook avatar Oct 04 '24 03:10 dicook

@dicook, the installed package is about 5.1MB even after reducing the number of rows.
I suggest that we subset the data to just the OECD countries: https://www.oecd.org/en/about/members-partners.html.

Procedures: https://github.com/kevinwang09/learningtower/blob/master/inst/sampling_student_and_school.R

image

kevinwang09 avatar Dec 02 '24 05:12 kevinwang09

That’s a reasonable approach

On 2 Dec 2024, at 4:08 pm, kevinwang @.***> wrote:

@dicook, the installed package is about 5.1MB even after reducing the number of rows. I suggest that we subset the data to just the OECD countries: https://www.oecd.org/en/about/members-partners.html. Procedures: https://github.com/kevinwang09/learningtower/blob/master/inst/sampling_student_and_school.R image.png (view on web) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

cheers, Di


Dianne Cook @.***

dicook avatar Dec 02 '24 05:12 dicook

The use of factor column was the main cause of a large package size. After re-curation,

  • We ensured that year and school_id are now integer and character columns.
  • We subsetted the in-package data is limited to OECD countries. The full data remains intact.

kevinwang09 avatar Dec 15 '24 21:12 kevinwang09