honest framework

honest-test

The vocabulary declaration is the test specification. Write the declaration; get the tests.

The vocabulary is the test specification. classify() is pure. The test is: assert classify(input) == expected.

def test_format_value():
    assert format_value("USD", "standard") == "$1,000.00"
    assert format_value("EUR", "standard") == "€1.000,00"
    assert format_value("USD", "compact")  == "$1K"
    # 3 of 150 currencies × 4 styles = 600 combinations.
    # You wrote 3.
    # The other 597 are untested. Bugs live there.
    # And no adversarial inputs.

from honest_test import run

# You wrote the vocabulary. honest-test reads it.
results = run(format_pipeline)

# format_name(5) × currency_code(150) × style_name(4)
# = 3,000 test cases. All run. No sampling.
# + 847 adversarial near-miss inputs. All rejected.
# Purity, mutation, idempotency verified automatically.
# You wrote: 0 test cases.

RSpec.describe FormatService do
  it "formats USD in standard style" do
    expect(subject.format("USD", "standard")).to eq("$1,000.00")
  end
  it "formats EUR in standard style" do
    expect(subject.format("EUR", "standard")).to eq("€1.000,00")
  end
  # 2 of 600 combinations tested manually.
  # Adversarial inputs: none.
  # Purity: assumed.

require "honest_test"

# You wrote the vocabulary. honest-test reads it.
results = HonestTest.run(FormatPipeline)

# currency_code(150) × style_name(4) × format_name(5)
# = 3,000 test cases. All run.
# + adversarial inputs. All rejected.
# Mutation detection. Purity. Idempotency.
# You wrote: 0 test cases.

func TestFormatValue(t *testing.T) {
    cases := []struct{ currency, style, want string }{
        {"USD", "standard", "$1,000.00"},
        {"EUR", "standard", "€1.000,00"},
        {"USD", "compact",  "$1K"},
    }
    for _, c := range cases {
        got := FormatValue(c.currency, c.style)
        assert.Equal(t, c.want, got)
    }
    // 3 of 600 combinations. Adversarial: none.

import "github.com/honest-framework/honest-test"

// You wrote the vocabulary. honest-test reads it.
results := honesttest.Run(FormatPipeline)

// CurrencyCode(150) × StyleName(4) × FormatName(5)
// = 3,000 test cases. All run.
// + adversarial inputs, mutation detection, purity.
// You wrote: 0 test cases.

class FormatTest extends TestCase {
    public function testFormatUSD(): void {
        $this->assertEquals('$1,000.00', format_value('USD', 'standard'));
    }
    public function testFormatEUR(): void {
        $this->assertEquals('€1.000,00', format_value('EUR', 'standard'));
    }
    // 2 of 600 combinations. Adversarial: none.
    // You manually wrote every case.

use HonestTest\Runner;

// You wrote the vocabulary. honest-test reads it.
$results = Runner::run(FormatPipeline::class);

// currency_code(150) × style_name(4) × format_name(5)
// = 3,000 test cases. All run.
// + adversarial inputs, mutation detection, purity.
// You wrote: 0 test cases.

# Write test cases manually.
# Sample from the input space.
# Hope you picked the right ones.
# Adversarial inputs: whatever you thought of.
# Purity and mutation: assumed or tested separately.
# The vocabulary exists in the production code;
# the tests duplicate it in a different language.

# The vocabulary IS the test specification.
# honest-test reads the declaration.
# Enumerates the full cartesian product of all Set members.
# Generates edit-distance-1 adversarial neighbors.
# Verifies purity, mutation, idempotency automatically.
# You write: 0 test cases for bounded types.

The concept

You have a function that accepts a sort parameter. The valid values are "name", "date", and "status". You write a test for "name" and a test for "date". You ship. Six months later someone passes "status" and hits a code path that was never tested.

This is not a gap in your test discipline. It is a gap in how tests are written. You have to know which values exist, remember to cover all of them, and keep that list in sync with the code that enforces it. Three separate places, maintained by hand.

honest-test reads the vocabulary you already declared and generates the tests automatically.

How it works

Given a vocabulary:

vocab = vocabulary({
    "sort":   {"name", "date", "status"},
    "order":  {"asc", "desc"},
    "filter": {"active", "archived"},
})

honest-test produces every valid combination: 3 × 2 × 2 = 12 test cases. Every one runs. If you add "created" to the sort set, 8 new cases appear automatically. If you remove a value, those cases disappear. You do not update the tests. There are no tests to update.

Adversarial inputs

For every valid value in a set, honest-test also generates near-misses: "Name" (wrong case), "nam" (deletion), "naame" (insertion), " name" (whitespace). Every one must be rejected. If any are accepted, the vocabulary has a gap — either a case-sensitivity bug or an overlapping recognizer.

Honesty tests

Beyond input coverage, honest-test verifies that your functions behave as declared:

Purity. Call the same function twice with the same input. If the results differ, the function has a hidden dependency — a global, a timestamp, something it should not be reading. Boundary links are exempt.

Mutation. Copy the input before calling a function. Compare it after. If it changed, the function modified its input. This is always a violation in honest code.

Chain contracts. For every adjacent pair of links in a chain, honest-test generates the full set of valid outputs from the first link and passes each one to the second. A failure in the second link means the first is producing output the second cannot handle. The interface is broken.

These checks run automatically. The developer writes nothing for them.

State machines

If your code declares a state machine, honest-test verifies every transition: does (state, event) produce the declared next state? It also tests every undeclared combination: does it correctly reject? Adversarial near-miss state names are tested for correct rejection.

The state vocabulary and transition table are the complete specification. honest-test reads them.

The abstract principle

If you can enumerate every valid input, you can run every valid input.

This single observation separates honest-test from every other testing approach in common use.

The problem with sampling

Property-based testing (QuickCheck, Hypothesis, fast-check) generates random inputs from a type's value space and checks that properties hold. It is probabilistic: run it a thousand times and you might find a bug. Run it a thousand and one times and you might not. The coverage is a function of how many samples you generate and whether those samples happen to include the bug-triggering input.

Unit tests are manual sampling: the developer picks specific inputs they think are important. The coverage is a function of what the developer thought of at the time they wrote the tests.

Both approaches sample from a potentially infinite space. The sampling rate determines the probability of finding bugs. You can never achieve total coverage.

The honest-test guarantee

honest-type vocabularies are finite Sets. A Set of five currency codes has five members. Its cartesian product with a Set of four format names has exactly 20 combinations. Every combination can run. Every combination must pass. This is not probabilistic. It is total.

The formal property is exhaustive enumeration: every member of the input space is exercised. For a bounded input space, this is achievable. For an unbounded input space, it is not. honest-test achieves it for the bounded parts of the vocabulary and applies best-effort boundary testing for the unbounded parts (predicates).

This is a categorically stronger guarantee than property-based testing. The difference is not degree (more tests versus fewer tests). The difference is kind (total coverage versus probabilistic coverage). You cannot achieve total coverage by running more random samples. You can only achieve it by changing the structure of the type declarations.

Honesty tests as program verification

Purity, mutation detection, and idempotency are program properties, not business properties. They do not depend on the domain. A pure function is pure regardless of whether it handles currencies or user accounts.

honest-test verifies these properties at runtime by treating the function as a black box: - Purity: same input → same output. Call twice, compare. - Mutation: input unchanged after call. Snapshot before, compare after. - Idempotency: same result on second call. Run chain twice, compare.

These are not unit tests for specific behaviors. They are structural verification that the function conforms to the architectural contract. A function that fails purity is not just buggy — it is architecturally dishonest. honest-test reports this as a honesty violation, not a test failure.

The compile-time argument, resolved

The argument for static typing is: catch errors at compile time, not at runtime. This is correct. But it requires an assumption: that the type checker can analyze every code path, and that the types are expressive enough to capture every constraint.

honest-test eliminates the need for this assumption by running every code path. Not statically analyzing it. Actually running it, with every valid input, at test time. The guarantees are not weaker than a static type checker's. For bounded types they are stronger: a static type checker tells you the type is correct; honest-test tells you every instance of the type passes every constraint, actually, on this machine, right now.

The compile-time argument assumed that running the code was too expensive. For a vocabulary with 3,000 combinations that completes in milliseconds, it is not.

Full specification

Predicate Classification

Before generating test cases, honest-test classifies each predicate by AST analysis:

Class	Detection	Strategy
Set	`isinstance(recognizer, set)`	Enumerate all members
Numeric	Contains `int(s)`, numeric comparison	Fibonacci sequence
Length-bounded	Contains `len(s) ==` or `len(s) <`	Enumerate valid lengths
Regex	Contains `re.match`, `re.fullmatch`	Generate from pattern
Character-class	Contains `s.isdigit()`, `s.isalpha()`	Enumerate character classes
External	Calls function not in codebase	Programmer-supplied via `honest-test.yaml`
Composite	Calls function defined in codebase	Recurse into callee AST

External predicates that cannot be analyzed emit a warning and skip generation unless test values are supplied in honest-test.yaml.

Set Enumeration Algorithm

FUNCTION enumerate_sets(vocabulary):
    set_types ← { name: list(members)
                  FOR (name, recog) IN vocabulary
                  IF recog IS a Set }
    RETURN cartesian_product(set_types.values())

For a vocabulary with format_name(5) × currency_code(150) × style_name(4): 3,000 test cases. All run. No sampling. Maybe slots add one case: Nothing.

Adversarial Input Generation

For every Set member, honest-test generates edit-distance-1 neighbors:

FUNCTION adversarial_neighbors(value):
    results ← []
    # Deletions: remove each character
    # Insertions: insert each alphanumeric at each position
    # Substitutions: replace each character with each alphanumeric
    # Case variations: lower, upper, title
    # Whitespace variations: leading space, trailing space, internal space
    RETURN deduplicate(results) - {value}

Every adversarial neighbor must produce a rejection. Any that are accepted expose vocabulary overlap or case-sensitivity bugs.

Purity Verification Algorithm

FUNCTION verify_purity(link, test_manifest):
    result_1 ← link(deep_copy(test_manifest))
    result_2 ← link(deep_copy(test_manifest))
    IF result_1 ≠ result_2:
        EMIT failure("non_deterministic", link.name)

Non-boundary links only. Boundary links are exempt.

Mutation Detection Algorithm

FUNCTION detect_mutation(link, test_manifest):
    snapshot_before ← deep_copy(test_manifest)
    result          ← link(test_manifest)
    IF snapshot_before ≠ test_manifest:
        diff ← diff(snapshot_before, test_manifest)
        EMIT failure("manifest_mutated", link.name, diff)

Any mutation of the input manifest is a honesty violation. There are no exceptions.

Chain Contract Testing

FUNCTION test_chain_contracts(chain, vocabulary, binding):
    FOR EACH adjacent pair (link_n, link_n1):
        FOR EACH test_manifest IN enumerate_test_cases(link_n.accepts):
            result ← link_n(test_manifest)
            IF "ok" IN result:
                result2 ← link_n1(result["ok"])
                IF "err" IN result2 AND result2["err"].category = "server":
                    EMIT failure("chain_contract", link_n.name, link_n1.name)

Client faults from link N+1 are not contract failures — they indicate the upstream link produced a value the downstream link correctly rejected. Only server faults indicate a broken interface.

State Machine Testing

# Valid transitions
FOR EACH (state, event) → next_state IN machine.transitions:
    result ← transition(machine, state, event)
    ASSERT result = ok({ state: next_state })

# Invalid transitions (all undeclared combinations)
FOR EACH state IN machine.states:
    FOR EACH event IN machine.events:
        IF (state, event) NOT IN machine.transitions:
            result ← transition(machine, state, event)
            ASSERT result = err({ code: "no_transition" })

# Adversarial inputs
FOR EACH state IN machine.states:
    FOR EACH adversarial IN adversarial_neighbors(state):
        result ← transition(machine, adversarial, first_valid_event)
        ASSERT result = err({ code: "invalid_state" })

honest-test.yaml

Programmer-supplied test values for external predicates:

predicates:
  customer_id:
    valid:   ["CUST-00001", "CUST-99999"]
    invalid: ["CUST-0", "cust-00001"]
    strategy: supplied_only

  order_amount:
    strategy: fibonacci
    limit: 1_000_000_000
    negative: false

coverage:
  minimum_vocabulary: 100   # fail if any Set member untested
  minimum_chain: 80         # warn if fault paths < 80%
  minimum_honesty: 100      # fail if any link fails honesty test

Coverage Model

Dimension	Metric
Vocabulary	`members_exercised / total_members × 100`
Chain	`fault_paths_exercised / total_fault_paths × 100`
Honesty	`honest_links / total_links × 100`
State machine	`transitions_exercised / total_transitions × 100`

Exhaustive Set enumeration drives vocabulary coverage to 100% automatically. Boundary links are reported separately, not as coverage failures.

Conformance Requirements

Requirement	Test
Set enumeration produces full cartesian product	Count test cases
Adversarial neighbors are all rejected	Run neighbors, check results
Purity test calls function twice, compares	Verify double-call behavior
Mutation detection deep-copies manifest before call	Verify snapshot independence
Chain contract testing only fails on server faults	Client fault must not fail contract
State machine tests all valid transitions	Verify count
State machine tests all invalid combinations	Verify count
`honest-test.yaml` values used when strategy is `supplied_only`	Test with external predicate
Reset fixture resets `_state` between tests	Verify flag isolation
Coverage report written to `coverage.json`	Check file after run

Reference

honest-test PyPI Python implementation
honest-test-architecture.md Full spec — exhaustive enumeration, honesty tests, BDD