Process Factory: A Self-Auditing DAG Execution Engine for Recursive Quality Improvement

Technical Architecture, Operator Experience, and Second-Order Effects

Jason MacDonald | April 2026 | Version 1.0

Section 1

Abstract

Process Factory is a config-driven, block-based DAG execution engine designed to run multi-step pipelines with dependency resolution, human gates, threshold checks, and LLM-powered block execution. Each pipeline is defined as a JSON template of typed blocks. The engine evaluates readiness by inspecting block states and dependency graphs — a stateless evaluation that requires no persistent memory beyond the run's current state stored in a single Supabase JSONB column.

After every block execution, the engine runs a 5-dimension structural audit (D1–D5) that scores completeness, specificity, downstream readiness, human auditability, and second-order effects. This audit is a pure function: no LLM calls, no network requests, sub-100ms execution time. It operates on heuristics — artifact counts against golden examples, generic phrase detection across 16 patterns, JSON structural validation, placeholder scanning, and upstream artifact consumption tracing.

The key architectural decision: the audit loop is infrastructure, not process. It runs automatically, stores results in block state, generates prioritized fix suggestions, computes quality deltas across runs, and flags score inflation. Quality improvement happens as a side effect of execution — every block run makes the next one measurably better or surfaces exactly why it did not.

Section 2

System Architecture

Six components. Stateless scheduler. Atomic state persistence.

Pipeline FlowTemplates (JSON)DAG SchedulerBlock ExecutionAuto-AuditState PersistenceHuman GatesOperator WorkbenchQuality Dashboard
Section 3

The 5-Dimension Quality Model (D1–D5)

Each dimension checks a distinct failure mode. Weights: D1 (25%), D2 (25%), D3 (20%), D4 (10%), D5 (20%).

Section 4

The Self-Improvement Mechanism

The core innovation: every execution produces audit data, every audit produces fix suggestions, every fix is measurable, and every measurement feeds the next execution.

Recursive Improvement CycleBlock Executes
→ Auto-Audit scores D1–D5 (pure heuristics, no LLM, <100ms)→ Fix Suggestions generated (context-aware, priority-ranked)→ Scores stored in block state alongside execution result→ Delta computed vs previous execution (per-dimension + composite)→ If composite < threshold: flag for operator with specific fix actions→ If calibration warning: self-score diverges from auto-audit by >2 points→ Operator reviews, resolves gaps, re-executes block→ Next execution: delta shows improvement or regression with exact numbers

What makes it recursive: Auto-audit runs after every execution. Fix suggestions reference specific artifacts and JSON paths, not generic advice. Delta tracking means improvement is measurable to one decimal place. The calibration_warning flag fires when the LLM's self-assessment diverges from structural reality.

Section 5

Operator Experience

3-panel workbench. Every decision point surfaces the data needed to decide.

Pipeline View: 23 blocks in 10 phases, color-coded by status. SummaryBar shows status counts + run-level quality composite + weakest dimension + calibration warnings. Auto-advance runs non-gate blocks automatically.

PanelContents
Left: GuideBlock summary, instructions, quality checklist
Center: InputsForm fields, file upload, upstream artifacts
Center: OutputsArtifacts, audit badge (A:7.2), D1-D5 panel with progress bars, issues, suggestions, deltas
Center: LogicModel, timeout, threshold, gate, retry config
Right: ChatBlock-scoped advisory + factory-level META chat
Right: GapsArtifact-aware gap reconciliation with LLM evaluation
Right: QAQuality gate: approve/reject with feedback

Post-Run: Reflection prompt with 4 questions + mic input (Web Speech API). Stored as Process Memory.

Section 6

The Meta-Audit Template

The factory audits itself as a factory run.

meta-audit-factory.json defines a 5-block pipeline:

BlockFunction
run-aggregateCompute per-block and run-level D1-D5 composites. Threshold: 5.0 minimum.
specificity-deep-checkExpert name density, generic phrase ratio, content depth analysis.
cross-block-consistencyVerify artifact references resolve. Detect orphaned dependencies.
delta-reportCompare scores against previous run. Per-dimension delta tables.
audit-scorecard-publisherCompile into published HTML scorecard. Human gate before publish.
The meta-audit run's own 5 blocks are auto-audited by the same D1-D5 system. The audit of the audit is recursive.
meta-audit-factory.json — pattern: dag-batch
Section 7

Evidence: What We Measured

Real data from the Samuel Ngu expert clone pipeline — 23 blocks, 10 phases.

Audit v1 (baseline)

Composite Score
6.1 / 10
Samuel-specific
60%

Fix Cycle: frameworks.json 17% → 100% prompt templates. Resources rewritten. L2 extractors wired to proto-* artifacts.

Audit v2 (post-fix)

Composite Score
8.5 / 10
Samuel-specific
80%
Delta
+2.4

Session-Audit (Engine Wiring)

PassFieldsWiredOrphans
Initial13121
Interval 128253 → fixed
Interval 228280

L0 Research: $0.40–$1.30/expert. After proto-* wiring, investment compounds into L2 extraction. D5 drops 4 points if L2 blocks don't reference upstream keys.

Section 8

Second-Order Effects

Immediate effect → compounding second-order effect.

Section 9

References