bukti/ wiki

VOI Schema — Verified Outcome Instance

A Verified Outcome Instance (VOI) is the atomic, immutable unit of evidence in Bukti. Every capability claim in the system is backed by one or more VOIs. This page documents the schema, the evidence-type enumeration, and the immutability rules.


What a VOI represents

A VOI is not a "verified" claim in the cryptographic sense — it is a structured record of a single piece of evidence asserting that an entity has demonstrated a capability. The name reflects the aspiration (outcome-backed verification), not the guarantee. The strength of the evidence is scored by the substantive grading system (see scoring-formula.md).


Schema

class VOI(BaseModel):
    id:                  str         # Unique VOI identifier (UUID)
    entity_id:           str         # Entity this evidence belongs to
    capability_id:       str         # Ontology node reference
    evidence_type:       EvidenceType
    evidence_text:       str         # Extracted evidence snippet
    evidence_uri:        str | None  # Source URL
    source_platform:     str         # "github", "credly", "resume", "web", ...
    extraction_confidence: float     # LLM parse quality (0–1); ge=0, le=1
    observed_at:         datetime    # When capability was demonstrated (valid time)
    recorded_at:         datetime    # When Bukti recorded this VOI (transaction time)
    raw_signal_hash:     str | None  # Deduplication hash
    supersedes_id:       str | None  # ID of VOI this corrects/supersedes

Evidence type enumeration

The EvidenceType enum defines the evidence types the system recognizes. Each type has a default weight in the scoring formula (see evidence-weights.md).

class EvidenceType(StrEnum):
    behavioral_artifact    = "behavioral_artifact"     # commits, code artifacts
    task_outcome           = "task_outcome"            # deployed projects, measurable outputs
    peer_attestation       = "peer_attestation"        # third-party endorsement
    contribution_artifact  = "contribution_artifact"   # open-source contributions
    publication_artifact   = "publication_artifact"    # papers, articles
    credential_badge       = "credential_badge"        # Credly, Open Badges, certificates
    indirect_attestation   = "indirect_attestation"    # mentioned by others, not endorsed
    self_reported          = "self_reported"           # resume self-claims
    self_authored          = "self_authored"           # self-authored content about capability

Bi-temporal semantics

VOIs carry two timestamps:

  • observed_atvalid time: when the underlying capability was demonstrated. For a GitHub commit, this is the commit date. For a credential, this is the issue date or the event date. This is the timestamp used in decay calculations.
  • recorded_attransaction time: when Bukti recorded the VOI in the system. This is always set to the current time at ingestion and cannot be modified.

Bi-temporal storage means Bukti can reconstruct "what did the system know about entity X on date D, for evidence that was valid at that date?" — which matters for audit trails and for detecting retroactive evidence insertion.


Immutability rule

VOIs are never modified. If a VOI is found to be incorrect (wrong date, wrong capability mapping, wrong extraction), the correction procedure is:

  1. Create a new VOI with the correct data.
  2. Set supersedes_id on the new VOI to point at the original VOI's id.
  3. The original VOI is preserved in the database with all original values.

The scoring system only uses the latest non-superseded VOI in each supersession chain. Superseded VOIs are retained for audit purposes.


Extraction confidence

extraction_confidence is the LLM's self-reported quality for the parse operation that produced this VOI. It is a float on [0, 1]. In the scoring formula, it acts as the valence s_i — the degree to which this VOI is "success-like" versus "failure-like." A VOI with extraction_confidence = 0.9 contributes more positive pseudo-count than one with extraction_confidence = 0.5.

Caveat: extraction_confidence is an uncalibrated LLM self-report today. It has not been validated against held-out human-labeled VOIs. When calibration data exists, it will be recalibrated or replaced with a binary gate ("parses cleanly" / "does not"). See calibration-status.md.


Related pages