Linas Valiukas
Linas Valiukas
To be more specific, here: https://github.com/berkmancenter/mediacloud/blob/master/apps/common/src/python/mediawords/util/url/__init__.py#L158-L246 we normalize URLs with various "cruft" (tracking parameters, etc.) into their canonical form. For some examples, see the unit test: https://github.com/berkmancenter/mediacloud/blob/master/apps/common/tests/python/mediawords/util/test_url.py#L117-L201
Some wishlist items of mine: * Merge with [`normalize_youtube_url()`](https://github.com/berkmancenter/mediacloud/blob/master/apps/common/src/python/mediawords/util/url/__init__.py#L319-L346) * Users (if any) will probably want to use their own user agent (web client) and logging for the module, so...
https://quay.io/plans/ also worth looking at, although they do rate limiting and I'm not quite sure what that limit is.
> Make sure pull rate-limiting only applies to users in our org (i.e. public, non-MC users don't count towards the limit) It's more like "make sure that we get our...
No, why pass those credentials? One is supposed to "docker login" using their own Docker Hub user on their own laptop to be able to push something. If someone quits...
Started test full backup to S3.
Initial full PgBackRest backup completed in 4238 minutes (~71 hours) and used up ~14 TB of space (compressed with lz4), but unfortunately it turned out that we won't be able...
WAL-G took 10837 minutes (181 hours, or almost 8 days) to complete, which is disappointing :( The backup consists of 28,2k (15.9 TB) files, and in those 8 days PostgreSQL...
A few incremental backups managed to finish too. Some stats: * full backup: * `aws s3 ls --summarize --human-readable --recursive s3://mediacloud-postgresql-wal-backups/postgresql-server/basebackups_005/base_000000010004D03400000047/` * completed in 10,838 minutes (or 180 hours, or...
Started initial backup to B2.