Adversarial-text defense
TR39 confusable folding, bidi / zero-width / control stripping, and zalgo capping — the Unicode attack surface most pipelines never check.
A compiled-Rust toolkit that canonicalizes and neutralizes adversarial Unicode — homoglyph spoofing, bidi / Trojan-Source, zalgo, and invisible characters — before it reaches your classifiers, indexes, logs, and identifiers.
pip install disarm
·
cargo add disarm
TR39 confusable folding, bidi / zero-width / control stripping, and zalgo capping — the Unicode attack surface most pipelines never check.
A Rust engine with compile-time perfect-hash tables and a single Python boundary crossing. No regex, no per-character Python loops. See benchmarks →
unsafe_code = "forbid" across the entire codebase. Memory safety isn't traded for speed — it's a property of the build.
disarm normalizes input. It is not an output sanitizer: encode at your sink. The threat model says exactly what is and isn't covered.
Transliteration, slugification, filename safety, and Unicode normalization across 80+ language profiles and many scripts.
Compatibility aliases for Unidecode, python-slugify, and pathvalidate make migration a one-line change.
“A defense-in-depth layer, not a complete control.” disarm reduces a specific, enumerated attack surface — and documents the rest.