python-codext
python-codext copied to clipboard
Python codecs extension featuring CLI tools for encoding/decoding anything
CodExt data:image/s3,"s3://crabby-images/ec3c2/ec3c262f4f1a50f8d08ce57f51b04f2e9efc4d27" alt="Tweet"
Encode/decode anything.
CodExt is a (Python2-3 compatible) library that extends the native codecs
library (namely for adding new custom encodings and character mappings) and provides 120+ new codecs, hence its name combining CODecs EXTension. It also features a guess mode for decoding multiple layers of encoding and CLI tools for convenience.
$ pip install codext
Want to contribute a new codec ? | Want to contribute a new macro ? |
---|---|
Check the documentation first Then PR your new codec |
PR your updated version of macros.json |
:mag: Demonstrations
:computer: Usage (main CLI tool) data:image/s3,"s3://crabby-images/fd866/fd866c286a7a019b38fe922b6292769a8112f3df" alt="Tweet on codext"--lightgrey?logo=twitter&style=social)
$ codext -i test.txt encode dna-1
GTGAGCGGGTATGTGA
$ echo -en "test" | codext encode morse
- . ... -
$ echo -en "test" | codext encode braille
⠞⠑⠎⠞
$ echo -en "test" | codext encode base100
👫👜👪👫
Chaining codecs
$ echo -en "Test string" | codext encode reverse
gnirts tseT
$ echo -en "Test string" | codext encode reverse morse
--. -. .. .-. - ... / - ... . -
$ echo -en "Test string" | codext encode reverse morse dna-2
AGTCAGTCAGTGAGAAAGTCAGTGAGAAAGTGAGTGAGAAAGTGAGTCAGTGAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTTAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTGAGAAAGTC
$ echo -en "Test string" | codext encode reverse morse dna-2 octal
101107124103101107124103101107124107101107101101101107124103101107124107101107101101101107124107101107124107101107101101101107124107101107124103101107124107101107101101101107124103101107101101101107124107101107124107101107124107101107101101101107124124101107101101101107124103101107101101101107124107101107124107101107124107101107101101101107124107101107101101101107124103
$ echo -en "AGTCAGTCAGTGAGAAAGTCAGTGAGAAAGTGAGTGAGAAAGTGAGTCAGTGAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTTAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTGAGAAAGTC" | codext -d dna-2 morse reverse
test string
Using macros
$ codext add-macro my-encoding-chain gzip base63 lzma base64
$ codext list macros
example-macro, my-encoding-chain
$ echo -en "Test string" | codext encode my-encoding-chain
CQQFAF0AAIAAABuTgySPa7WaZC5Sunt6FS0ko71BdrYE8zHqg91qaqadZIR2LafUzpeYDBalvE///ug4AA==
$ codext remove-macro my-encoding-chain
$ codext list macros
example-macro
:computer: Usage (base CLI tool) data:image/s3,"s3://crabby-images/302f3/302f37ac875f386da92079a27639d5d00485ef42" alt="Tweet on unbase"--lightgrey?logo=twitter&style=social)
$ echo "Test string !" | base122
*.7!ft9�-f9Â
$ echo "Test string !" | base91
"ONK;WDZM%Z%xE7L
$ echo "Test string !" | base91 | base85
B2P|BJ6A+nO(j|-cttl%
$ echo "Test string !" | base91 | base85 | base36 | base58-flickr
QVx5tvgjvCAkXaMSuKoQmCnjeCV1YyyR3WErUUErFf
$ echo "Test string !" | base91 | base85 | base36 | base58-flickr | base58-flickr -d | base36 -d | base85 -d | base91 -d
Test string !
$ echo "Test string !" | base91 | base85 | base36 | base58-flickr | unbase -m 3
Test string !
$ echo "Test string !" | base91 | base85 | base36 | base58-flickr | unbase -f Test
Test string !
:computer: Usage (Python)
Getting the list of available codecs:
>>> import codext
>>> codext.list()
['ascii85', 'base85', 'base100', 'base122', ..., 'tomtom', 'dna', 'html', 'markdown', 'url', 'resistor', 'sms', 'whitespace', 'whitespace-after-before']
>>> codext.encode("this is a test", "base58-bitcoin")
'jo91waLQA1NNeBmZKUF'
>>> codext.encode("this is a test", "base58-ripple")
'jo9rA2LQwr44eBmZK7E'
>>> codext.encode("this is a test", "base58-url")
'JN91Wzkpa1nnDbLyjtf'
>>> codecs.encode("this is a test", "base100")
'👫👟👠👪🐗👠👪🐗👘🐗👫👜👪👫'
>>> codecs.decode("👫👟👠👪🐗👠👪🐗👘🐗👫👜👪👫", "base100")
'this is a test'
>>> for i in range(8):
print(codext.encode("this is a test", "dna-%d" % (i + 1)))
GTGAGCCAGCCGGTATACAAGCCGGTATACAAGCAGACAAGTGAGCGGGTATGTGA
CTCACGGACGGCCTATAGAACGGCCTATAGAACGACAGAACTCACGCCCTATCTCA
ACAGATTGATTAACGCGTGGATTAACGCGTGGATGAGTGGACAGATAAACGCACAG
AGACATTCATTAAGCGCTCCATTAAGCGCTCCATCACTCCAGACATAAAGCGAGAC
TCTGTAAGTAATTCGCGAGGTAATTCGCGAGGTAGTGAGGTCTGTATTTCGCTCTG
TGTCTAACTAATTGCGCACCTAATTGCGCACCTACTCACCTGTCTATTTGCGTGTC
GAGTGCCTGCCGGATATCTTGCCGGATATCTTGCTGTCTTGAGTGCGGGATAGAGT
CACTCGGTCGGCCATATGTTCGGCCATATGTTCGTCTGTTCACTCGCCCATACACT
>>> codext.decode("GTGAGCCAGCCGGTATACAAGCCGGTATACAAGCAGACAAGTGAGCGGGTATGTGA", "dna-1")
'this is a test'
>>> codecs.encode("this is a test", "morse")
'- .... .. ... / .. ... / .- / - . ... -'
>>> codecs.decode("- .... .. ... / .. ... / .- / - . ... -", "morse")
'this is a test'
>>> with open("morse.txt", 'w', encoding="morse") as f:
f.write("this is a test")
14
>>> with open("morse.txt",encoding="morse") as f:
f.read()
'this is a test'
>>> codext.decode("""
=
X
:
x
n
r
y
Y
y
p
a
`
n
|
a
o
h
`
g
o
z """, "whitespace-after+before")
'CSC{not_so_invisible}'
>>> print(codext.encode("An example test string", "baudot-tape"))
***.**
. *
***.*
* .
.*
* .*
. *
** .*
***.**
** .**
.*
* .
* *. *
.*
* *.
* *. *
* .
* *.
* *. *
***.
*.*
***.*
* .*
:page_with_curl: List of codecs
BaseXX
- [X]
base1
: useless, but for the sake of completeness - [X]
base2
: simple conversion to binary (with a variant with a reversed alphabet) - [X]
base3
: conversion to ternary (with a variant with a reversed alphabet) - [X]
base4
: conversion to quarternary (with a variant with a reversed alphabet) - [X]
base8
: simple conversion to octal (with a variant with a reversed alphabet) - [X]
base10
: simple conversion to decimal - [X]
base11
: conversion to digits with a "a" - [X]
base16
: simple conversion to hexadecimal (with a variant holding an alphabet with digits and letters inverted) - [X]
base26
: conversion to alphabet letters - [X]
base32
: classical conversion according to the RFC4648 with all its variants (zbase32, extended hexadecimal, geohash, Crockford) - [X]
base36
: Base36 conversion to letters and digits (with a variant inverting both groups) - [X]
base45
: Base45 DRAFT algorithm (with a variant inverting letters and digits) - [X]
base58
: multiple versions of Base58 (bitcoin, flickr, ripple) - [X]
base62
: Base62 conversion to lower- and uppercase letters and digits (with a variant with letters and digits inverted) - [X]
base63
: similar tobase62
with the "_
" added - [X]
base64
: classical conversion according to RFC4648 with its variant URL (or file) (it also holds a variant with letters and digits inverted) - [X]
base67
: custom conversion using some more special characters (also with a variant with letters and digits inverted) - [X]
base85
: all variants of Base85 (Ascii85, z85, Adobe, (x)btoa, RFC1924, XML) - [X]
base91
: Base91 custom conversion - [X]
base100
(or emoji): Base100 custom conversion - [X]
base122
: Base100 custom conversion - [X]
base-genericN
: see base encodings ; supports any possible base
This category also contains ascii85
, adobe
, [x]btoa
, zeromq
with the base85
codec.
Binary
- [X]
baudot
: supports CCITT-1, CCITT-2, EU/FR, ITA1, ITA2, MTK-2 (Python3 only), UK, ... - [X]
baudot-spaced
: variant ofbaudot
; groups of 5 bits are whitespace-separated - [X]
baudot-tape
: variant ofbaudot
; outputs a string that looks like a perforated tape - [X]
bcd
: Binary Coded Decimal, encodes characters from their (zero-left-padded) ordinals - [X]
bcd-extended0
: variant ofbcd
; encodes characters from their (zero-left-padded) ordinals using prefix bits0000
- [X]
bcd-extended1
: variant ofbcd
; encodes characters from their (zero-left-padded) ordinals using prefix bits1111
- [X]
excess3
: uses Excess-3 (aka Stibitz code) binary encoding to convert characters from their ordinals - [X]
gray
: aka reflected binary code - [X]
manchester
: XORes each bit of the input with01
- [X]
manchester-inverted
: variant ofmanchester
; XORes each bit of the input with10
- [X]
rotateN
: rotates characters by the specified number of bits (N belongs to [1, 7] ; Python 3 only)
Common
- [X]
a1z26
: keeps words whitespace-separated and uses a custom character separator - [X]
cases
: set of case-related encodings (including camel-, kebab-, lower-, pascal-, upper-, snake- and swap-case, slugify, capitalize, title) - [X]
dummy
: set of simple encodings (including integer, replace, reverse, word-reverse, substite and strip-spaces) - [X]
octal
: dummy octal conversion (converts to 3-digits groups) - [X]
octal-spaced
: variant ofoctal
; dummy octal conversion, handling whitespace separators - [X]
ordinal
: dummy character ordinals conversion (converts to 3-digits groups) - [X]
ordinal-spaced
: variant ofordinal
; dummy character ordinals conversion, handling whitespace separators
Compression
- [X]
gzip
: standard Gzip compression/decompression - [X]
lz77
: compresses the given data with the algorithm of Lempel and Ziv of 1977 - [X]
lz78
: compresses the given data with the algorithm of Lempel and Ziv of 1978 - [X]
pkzip_deflate
: standard Zip-deflate compression/decompression - [X]
pkzip_bzip2
: standard BZip2 compression/decompression - [X]
pkzip_lzma
: standard LZMA compression/decompression
:warning: Compression functions are of course definitely NOT encoding functions ; they are implemented for leveraging the
.encode(...)
API fromcodecs
.
Cryptography
- [X]
affine
: aka Affine Cipher - [X]
atbash
: aka Atbash Cipher - [X]
bacon
: aka Baconian Cipher - [X]
barbie-N
: aka Barbie Typewriter (N belongs to [1, 4]) - [X]
citrix
: aka Citrix CTX1 password encoding - [X]
railfence
: aka Rail Fence Cipher - [X]
rotN
: aka Caesar cipher (N belongs to [1,25]) - [X]
scytaleN
: encrypts using the number of letters on the rod (N belongs to [1,[) - [X]
shiftN
: shift ordinals (N belongs to [1,255]) - [X]
xorN
: XOR with a single byte (N belongs to [1,255])
:warning: Crypto functions are of course definitely NOT encoding functions ; they are implemented for leveraging the
.encode(...)
API fromcodecs
.
Hashing
- [X]
blake
: includes BLAKE2b and BLAKE2s (Python 3 only ; relies onhashlib
) - [X]
checksums
: includes Adler32 and CRC32 (relies onzlib
) - [X]
crypt
: Unix's crypt hash for passwords (Python 3 and Unix only ; relies oncrypt
) - [X]
md
: aka Message Digest ; includes MD4 and MD5 (relies onhashlib
) - [X]
sha
: aka Secure Hash Algorithms ; includes SHA1, 224, 256, 384, 512 (Python2/3) but also SHA3-224, -256, -384 and -512 (Python 3 only ; relies onhashlib
) - [X]
shake
: aka SHAKE hashing (Python 3 only ; relies onhashlib
)
:warning: Hash functions are of course definitely NOT encoding functions ; they are implemented for convenience with the
.encode(...)
API fromcodecs
and useful for chaning codecs.
Languages
- [X]
braille
: well-known braille language (Python 3 only) - [X]
ipsum
: aka lorem ipsum - [X]
galactic
: aka galactic alphabet or Minecraft enchantment language (Python 3 only) - [X]
leetspeak
: based on minimalistic elite speaking rules - [X]
morse
: uses whitespace as a separator - [X]
navajo
: only handles letters (not full words from the Navajo dictionary) - [X]
radio
: aka NATO or radio phonetic alphabet - [X]
southpark
: converts letters to Kenny's language from Southpark (whitespace is also handled) - [X]
southpark-icase
: case insensitive variant ofsouthpark
- [X]
tap
: converts text to tap/knock code, commonly used by prisoners - [X]
tomtom
: similar tomorse
, using slashes and backslashes
Others
- [X]
dna
: implements the 8 rules of DNA sequences (N belongs to [1,8]) - [X]
letter-indices
: encodes consonants and/or vowels with their corresponding indices - [X]
markdown
: unidirectional encoding from Markdown to HTML
Steganography
- [X]
hexagram
: uses Base64 and encodes the result to a charset of I Ching hexagrams (as implemented here) - [X]
klopf
: aka Klopf code ; Polybius square with trivial alphabetical distribution - [X]
resistor
: aka resistor color codes - [X]
rick
: aka Rick cipher (in reference to Rick Astley's song "Never gonna give you up") - [X]
sms
: also called T9 code ; uses "-
" as a separator for encoding, "-
" or "_
" or whitespace for decoding - [X]
whitespace
: replaces bits with whitespaces and tabs - [X]
whitespace_after_before
: variant ofwhitespace
; encodes characters as new characters with whitespaces before and after according to an equation described in the codec name (e.g. "whitespace+2*after-3*before
")
Web
- [X]
html
: implements entities according to this reference - [X]
url
: aka URL encoding