manubot.org icon indicating copy to clipboard operation
manubot.org copied to clipboard

update catalog thumbnails to be in line with guidelines

Open vincerubinetti opened this issue 6 years ago • 9 comments
trafficstars

update catalog thumbnails to be in line with the new thumbnail guidelines (soon-to-be) in manubot/catalog readme.

vincerubinetti avatar Sep 10 '19 22:09 vincerubinetti

Perhaps it make sense to wait until we have thumbnail functionality added to manubot process, which hopefully we can do in the next few weeks. This way we can just add thumbnails to the source repo and enable detection by the catalog.

dhimmel avatar Sep 11 '19 17:09 dhimmel

I think I'm still going to do it now. Because most of the work is just cropping/scaling them right, which I think we'll have to do anyway (just for this first batch).

vincerubinetti avatar Sep 11 '19 17:09 vincerubinetti

Also I think maybe let's move the thumbnail images to the catalog repo. What about this:

We prefer that users keep their thumbnail in their manuscript's repo, since that's really where it belongs, together with its content. But, we have an images folder in the catalog repo for hosting thumbnails where we can't contact the author to tell them to put it in their repo. I see this as better and more logical than hosting the images in random github comments.

Also, it would allow them to make just one PR, to catalog, to do both the catalog listing update (providing the urls) and uploading the thumbnail, if for some reason we decide we don't want the thumbnail in the manuscript's repo (though it still seems like we want that because of favicons).

vincerubinetti avatar Sep 11 '19 21:09 vincerubinetti

Also I think maybe let's move the thumbnail images to the catalog repo.

That does seem to make more sense than https://github.com/manubot/manubot.org. The ability to upload an image as part of the PR would be nice for cases where the thumbnail is not part of the manuscript repo or other repo of the authors.

My only worry is bloating the catalog repo. How big is each PNG? We could use Git LFS to store them, but that can make it slightly harder for some contributors without Git LFS installed locally.

dhimmel avatar Sep 12 '19 15:09 dhimmel

The png's size can really vary a lot, depending on the content of the image and the compression settings. Maybe from 10 kb to 200 kb. But with maximum compression (takes longer to decode but should still be lossless) it should be on the lower end of that spectrum.

I wouldn't worry about it for now, since this is a last resort for when we can't have the thumb in the manuscript repo.

Also, Greene Lab people have these multi-gigabyte datasets hosted on github with no problem.

vincerubinetti avatar Sep 12 '19 15:09 vincerubinetti

It seems like several of the images removed in https://github.com/manubot/manubot.org/pull/27 are around the 500 KB region. I'd say 1 MB is probably what a complex image that has lot's of pixel variation would take.

Also, Greene Lab people have these multi-gigabyte datasets hosted on github with no problem.

If their files are >100 MB, then they are using Git LFS, which does not bloat the git repository at all because the repository only stores the hash of the file.

I am curious whether if we use git lfs, will people still be able to upload PNGs by GitHub's web interface and have things work.

dhimmel avatar Sep 12 '19 15:09 dhimmel

I guess I see four options:

  1. store PNGs in manubot/catalog:master (master branch)
  2. store PNGs in manubot/catalog:master using Git LFS
  3. store PNGs in manubot/catalog:thumbnails
  4. store PNGs in manubot/catalog:thumbnails using Git LFS

The benefit of a different branch is that we don't clutter / bloat master and that we don't have to deal with local paths for thumbnails in the catalog... everything remains as URLs. The downside of a different branch is that a single PR cannot update catalog.yml and add a thumbnail.

We can discuss a bit more in person what is best.

dhimmel avatar Sep 12 '19 17:09 dhimmel

Can you define what bloat is and what the downsides are. It sounds like we're talking about doing something for the benefit of the computers instead of for the benefit of the users.

If it's an actual github storage limit, that makes sense to me. But I don't think a bunch of .pngs, which are really common files, will cause any problems on github, which frequently hosts repositories which have loads of images in them (like websites).

vincerubinetti avatar Sep 12 '19 17:09 vincerubinetti

It seems like several of the images removed in #27 are around the 500 KB region. I'd say 1 MB is probably what a complex image that has lot's of pixel variation would take.

I believe I had exported those with the lowest compression settings, so it should be possible to reduce it. But even so, I agree that 1 MB is a fair worst-case upper-limit to assume if we're trying to calculate storage limits.

vincerubinetti avatar Sep 12 '19 17:09 vincerubinetti