sourmash
sourmash copied to clipboard
`load_signatures_from_json` still uses `k*3` for protein loading
Over in https://github.com/sourmash-bio/sourmash/pull/1277#issuecomment-763050970, most of the sourmash command line and python API standardized to using k instead of k*3 for interacting with protein signatures, even though we still use and store k*3 internally.
A couple exceptions: load_one_signature_from_json and load_signatures_from_json, which still use k*3
load_one_signature_from_json
Load with k*3
s1 = sourmash.signature.load_one_signature_from_json("GCA_000961135.2.protein.sig.gz", ksize=30)
s1
SourmashSignature('GCA_000961135.2 Candidatus Aramenus sulfurataquae isolate AZ1-454', 5db46432)
s1.minhash.ksize
10
Load with k
load_one_signature_from_json("GCA_000961135.2.protein.sig.gz", ksize=10)
Traceback (most recent call last): File "
", line 1, in File "/home/ntpierce/.conda/envs/directsketch/lib/python3.12/site-packages/sourmash/signature.py", line 483, in load_one_signature_from_json raise ValueError("no signatures to load") ValueError: no signatures to load
load_signatures_from_json
load with k*3
sigs = sourmash.signature.load_signatures_from_json('GCA_000961135.2.protein.sig.gz', ksize=30)
for sig in sigs:
print(sig.name)
GCA_000961135.2 Candidatus Aramenus sulfurataquae isolate AZ1-454
load with k
sigs = sourmash.signature.load_signatures_from_json('GCA_000961135.2.protein.sig.gz', ksize=10)
for sig in sigs:
print(sig.name)
(no signature to print)
This might involve a lot of test changes, so maybe not worth changing, especially as we move more and more to .zip. But wanted to document at least.
There should be relatively few places in the codebase where we use load_signatures_from_json directly - the API I would suggest using is load_file_as_signatures, which handles many more signature formats and works with plugins.
load_signatures_from_json is a thin wrapper around Rust code which is why it still uses k*3.