C. Titus Brown
C. Titus Brown
@bluegenes what other databases are we looking at providing? AllTheBacteria has version numbers. ICTV...?
database search and retrieval ideas: * we could suggest that people construct a local manifest of their various databases combined, and then just use that; appropriate sketches would be automatically...
> As a future note, this plugin could also allow us to download and use database diffs for folks to prevent redownloading whole databases for minor updates. 🤩
some conclusions reached here: https://github.com/sourmash-bio/sourmash/issues/3764
@dependabot rebase
updated with `aws_region`, which now appears to be necessary: ``` import polars as pl sra_prj = "PRJEB74559" sra_metadata = pl.scan_parquet( "s3://sra-pub-metadata-us-east-1/sra/metadata/", storage_options={"skip_signature": "true", "aws_region": "us-east-1"}, ).select(["acc", "bioproject"]) # Filter the...
a specific thought: using parquet files as standalone manifests pointing at zip files would be great, if we can somehow avoid loading the manifests of the zip files. manifest loading...
ref https://github.com/sourmash-bio/sourmash/issues/3819 about storing signatures directly in parquet
@dependabot rebase
and someone just ran head-first into this 😠.