Anonymity is a statistical illusion.
Strip the name; the signature remains. Attribution Range measures stylometric fingerprints: function-word frequencies and habitual constructions that cluster anonymous artifacts back to a common origin. Validated on the canonical ground-truth case, then applied to the adversary problem.
The name is optional. The fingerprint is not.
Every author carries habitual micro-decisions below the level of conscious control: which function words they reach for, how often they write upon versus on, whilst versus while, whether consequently appears at three times the base rate or near zero. These are not stylistic choices. They are statistical regularities, stable across topics, immune to deliberate disguise, and measurable. The signature survives the anonymizing pass.
The operational corollary: the same technique that resolves disputed historical authorship is how you cluster anonymous fraud artifacts to a single actor across campaigns. A phishing kit, a scam script, an extortion note, an impersonation page. Train on the known corpus; classify the unknown. The fingerprint closes the attribution gap before any other intelligence is available.
114,276 words
39,226 words
Ground truth unknown at test time
Count the invisible words. Compute the distance. Name the author.
Attribution Range implements Burrows' Delta, the standard computational authorship metric introduced in 2002. For each author, it computes z-scored frequencies of 30 function words from the Mosteller-Wallace marker set: words the author uses habitually regardless of topic. Delta between a document and an author is the mean absolute z-score distance across all features. The nearest author is the attributed author. Validation runs leave-one-out cross-validation on the 65 undisputed papers before any disputed paper is touched.
Held-out accuracy: 100%. Disputed papers: 11 of 12 to Madison.
Before touching the disputed papers, the classifier is validated on the 65 undisputed papers via leave-one-out cross-validation: train on every undisputed paper except one, classify the held-out paper, repeat. The result is the accuracy score below. Only then are the 12 disputed papers run through the trained classifier.
Validated on ground truth. Now point it at the anonymous adversary.
The Federalist Papers case is canonical because the ground truth is documented independently: historians and linguists have established the authorship through textual, historical, and contextual evidence accumulated over two centuries. It is the ideal calibration target. A classifier that reaches the established result on a 240-year-old anonymized corpus is a classifier you can trust against a threat actor's anonymous phishing kit authored last week.
1 / 12: Hamilton (3.0% margin)
Aligns with Mosteller & Wallace (1964)
LOO accuracy: 100% (65/65)
The technique that resolves history resolves the adversary.
Fraud actors reuse authorial habits across campaigns. Phishing kits written by the same operator carry measurable stylometric overlap: the same function-word profile, the same sentence constructions, the same rhythmic choices below the level of deliberate control. A classifier trained on a known corpus of attributed artifacts can cluster new anonymous samples to an existing actor or to a new one with no prior samples.
The operational workflow is a straight translation from the academic case: collect a corpus of known-origin artifacts, establish the fingerprint per author cluster, run Delta against each new unknown. The attribution decision follows from the measurement. The analyst's job is not to guess; it is to run the instrument and read the output.
Method · Burrows, J., Delta: A Measure of Stylistic Difference and a Guide to Likely Authorship (2002) · LLC
Corpus · Project Gutenberg EBook #18: The Federalist Papers (public domain)
Attribution Range demonstrates that stylometric fingerprinting works on anonymized real-world corpora with ground-truth validation. The same instrument that closes a 240-year-old disputed authorship case closes the attribution gap on a modern fraud campaign. We run the measurement, score the fingerprint, and hand back the attribution.