sourmash icon indicating copy to clipboard operation
sourmash copied to clipboard

`load_signatures_from_json` still uses `k*3` for protein loading

Open bluegenes opened this issue 7 months ago • 1 comments
trafficstars

Over in https://github.com/sourmash-bio/sourmash/pull/1277#issuecomment-763050970, most of the sourmash command line and python API standardized to using k instead of k*3 for interacting with protein signatures, even though we still use and store k*3 internally.

A couple exceptions: load_one_signature_from_json and load_signatures_from_json, which still use k*3

load_one_signature_from_json

Load with k*3

s1 = sourmash.signature.load_one_signature_from_json("GCA_000961135.2.protein.sig.gz", ksize=30)
s1

SourmashSignature('GCA_000961135.2 Candidatus Aramenus sulfurataquae isolate AZ1-454', 5db46432)

s1.minhash.ksize

10

Load with k

load_one_signature_from_json("GCA_000961135.2.protein.sig.gz", ksize=10)

Traceback (most recent call last): File "", line 1, in File "/home/ntpierce/.conda/envs/directsketch/lib/python3.12/site-packages/sourmash/signature.py", line 483, in load_one_signature_from_json raise ValueError("no signatures to load") ValueError: no signatures to load

load_signatures_from_json

load with k*3

sigs = sourmash.signature.load_signatures_from_json('GCA_000961135.2.protein.sig.gz', ksize=30)
for sig in sigs:
    print(sig.name)

GCA_000961135.2 Candidatus Aramenus sulfurataquae isolate AZ1-454

load with k

sigs = sourmash.signature.load_signatures_from_json('GCA_000961135.2.protein.sig.gz', ksize=10)
for sig in sigs:
    print(sig.name)

(no signature to print)

This might involve a lot of test changes, so maybe not worth changing, especially as we move more and more to .zip. But wanted to document at least.

bluegenes avatar Apr 18 '25 19:04 bluegenes

There should be relatively few places in the codebase where we use load_signatures_from_json directly - the API I would suggest using is load_file_as_signatures, which handles many more signature formats and works with plugins.

load_signatures_from_json is a thin wrapper around Rust code which is why it still uses k*3.

ctb avatar Apr 19 '25 12:04 ctb