MasteryMade · Foundation PRD

PRD 4: Expert Extraction Pipeline (8-Module Automated Skill)

PRD 4 of 12 Depends on PRD 1 + 3 Owner: Lane A + D
Parent: Master Registry v1.0 — Section 06a

↑ Master Registry ← PRD 1: Registry ← PRD 2: Gates ← PRD 3: Ingest

4.1 Purpose

Transform the expert extraction methodology — which exists as Jason's mental model and scattered session transcripts — into an automated, repeatable skill that agent swarms can execute. Module 1 rubric is always first and gates everything downstream. Three-pass validation catches errors before they compound.

4.2 Pipeline Architecture

TRIGGER: Entity reaches pipeline_stage='researched' AND has ingested content

         ┌──────────────┐
         │ Module 1:    │ ◄── ALWAYS FIRST. Creates rubric.
         │ Thinking     │     Rubric fails validation → STOP.
         │ Structures   │     No downstream modules run.
         └──────┬───────┘
                │ rubric validated ✓
         ┌──────▼───────┐
         │ Modules 2-8  │ ◄── Sequential. Each validates
         │ (sequential)  │     against Module 1 rubric.
         └──────┬───────┘
                │ all complete
         ┌──────▼───────┐
         │ Module 9:    │ ◄── Retrieval patterns.
         │ Retrieval     │     When to surface what.
         └──────┬───────┘
                │
         ┌──────▼───────┐
         │ Three-pass   │ ◄── Forward, backward, ground truth.
         │ Validation    │
         └──────┬───────┘
                │ all passes ✓
         ┌──────▼───────┐
         │ Store to     │ ◄── expert_chunks in Supabase
         │ Supabase      │     with embeddings, by module
         └──────────────┘

4.3 Supabase Schema

Table: expert_extractions

CREATE TABLE expert_extractions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  entity_id UUID NOT NULL REFERENCES entities(id),
  gate INT NOT NULL CHECK (gate IN (2, 3)),
  module INT NOT NULL CHECK (module BETWEEN 1 AND 9),
  module_name TEXT NOT NULL,
  version INT NOT NULL DEFAULT 1,

  extracted_content JSONB NOT NULL,   -- structured per module
  raw_source_ids UUID[],              -- which content records used
  confidence FLOAT,                   -- self-assessed (0-1)

  validation_status TEXT NOT NULL DEFAULT 'pending' CHECK (
    validation_status IN ('pending','forward_pass','backward_pass',
    'ground_truth_pass','validated','failed')
  ),
  validation_notes TEXT,

  gate2_extraction_id UUID,           -- Gate 3: link to Gate 2 version
  gate2_accuracy JSONB,               -- {matched:[],corrected:[],missed:[]}

  extracted_at TIMESTAMPTZ DEFAULT now(),
  validated_at TIMESTAMPTZ,
  updated_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX idx_ext_entity ON expert_extractions(entity_id);
CREATE INDEX idx_ext_module ON expert_extractions(module);
CREATE INDEX idx_ext_gate ON expert_extractions(gate);
CREATE INDEX idx_ext_status ON expert_extractions(validation_status);

Table: expert_chunks (RAG retrieval)

CREATE TABLE expert_chunks (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  extraction_id UUID NOT NULL REFERENCES expert_extractions(id),
  entity_id UUID NOT NULL REFERENCES entities(id),
  module INT NOT NULL,
  chunk_text TEXT NOT NULL,
  chunk_type TEXT NOT NULL,  -- 'framework','voice_pattern','cta_template','case_study'
  embedding vector(1536),
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX idx_echunks_entity ON expert_chunks(entity_id);
CREATE INDEX idx_echunks_module ON expert_chunks(module);
CREATE INDEX idx_echunks_type ON expert_chunks(chunk_type);
CREATE INDEX idx_echunks_embedding ON expert_chunks
  USING ivfflat (embedding vector_cosine_ops);

4.4 Module Specs — Output Structures

Module 1: Thinking Structures (THE RUBRIC)

Input: All ingested content for entity. Prioritize long-form (webinars, interviews, coaching) over social posts.

{
  "rubric_name": "EXPERT_RUBRIC_NAME",
  "mental_models": ["model1", "model2"],
  "named_frameworks": [
    { "name": "", "components": [], "purpose": "" }
  ],
  "decision_logic": [
    { "if": "condition", "then": "action", "because": "reasoning" }
  ],
  "priority_hierarchy": ["first", "second", "third"],
  "inviolable_principles": ["principle1", "principle2"],
  "unique_terminology": { "term": "definition" }
}

Validation gate: Before Module 2 proceeds: at least 3 named frameworks identified, at least 5 if/then decision patterns, clear priority ordering, and Jason reviews rubric output. Module 1 is too important to auto-approve. This is the only human gate in the pipeline.

Module 2: Voice and Style

{
  "sentence_patterns": [],
  "signature_phrases": [],
  "tone_descriptors": ["direct","warm","challenging"],
  "emphasis_techniques": ["repetition","contrast","questions"],
  "language_avoids": [],
  "formality_level": "conversational|professional|academic",
  "humor_style": "dry|storytelling|none",
  "teaching_voice_vs_selling_voice": { "teaching":{}, "selling":{} }
}

Rubric check: "Does this voice pattern match the thinking rubric? Would someone who thinks like Module 1 describes naturally speak this way?"

Module 3: CTA Psychology

{
  "primary_motivation_triggers": [],
  "invitation_patterns": [],
  "urgency_creation": [],
  "objection_handling": [],
  "soft_vs_hard_cta_ratio": 0.7,
  "example_ctas": [{ "context":"", "cta_text":"", "motivation_lever":"" }]
}

Module 4: Embedded IP

{
  "proprietary_frameworks": [
    { "name":"", "components":[], "how_it_works":"",
      "when_to_use":"", "source_material":"" }
  ],
  "original_models": [],
  "unique_terminology": { "term": { "definition":"", "usage_context":"" } },
  "ip_that_must_not_be_altered": []
}

Module 5: Modularization

{
  "teaching_progressions": [
    { "name":"", "steps":[], "prerequisites":[], "builds_toward":"" }
  ],
  "prerequisite_chains": { "concept_a": ["requires_b","requires_c"] },
  "scaffold_order": ["first","second","third"],
  "beginner_vs_advanced": { "beginner":[], "advanced":[] }
}

Module 6: Meta-Structures

{
  "program_architectures": [
    { "name":"", "structure":"linear|modular|spiral",
      "phases":[], "engagement_arc":"", "milestones":[] }
  ],
  "coaching_flow": "",
  "content_delivery_preferences": ""
}

Module 7: Pattern Recognition

{
  "diagnostic_patterns": [
    { "signal":"", "diagnosis":"", "prescription":"", "source_example":"" }
  ],
  "triage_framework": "",
  "red_flags": [],
  "green_flags": []
}

Module 8: Prompt Templates (Ground Truth)

{
  "scenario_applications": [
    { "scenario":"", "expert_response":"",
      "frameworks_applied":[], "voice_markers":[], "source":"" }
  ]
}

These become test cases for clone validation — feeds expert-clone-scorer.

Module 9: Retrieval Patterns

{
  "routing_rules": [
    { "trigger":"user asks about X",
      "retrieve":"Module Y, framework Z",
      "context_required":"", "priority":"primary|secondary" }
  ],
  "context_windows": {
    "new_user": ["what to surface first"],
    "returning_user": ["surface based on history"],
    "specific_problem": ["identify and route to framework"]
  },
  "never_combine": ["A should not appear with B because..."],
  "always_combine": ["C is always better with D"]
}

4.5 Three-Pass Validation Protocol

Forward pass

"Does each module logically lead to the next?" Module 1 rubric → Module 2 voice should reflect thinking patterns. Module 2 voice → Module 3 CTAs should use voice patterns. Module 4 IP → Module 5 scaffolding should cover all framework components. Module 7 diagnostics → should reference Module 1 thinking. Module 8 examples → should demonstrate Modules 1-7 in action.

Prompt: "Review modules [N] and [N+1]. Does the output of [N] naturally lead to and support [N+1]? Identify gaps, contradictions, or missing connections."

Backward pass

"Does the deployment artifact trace back to source material?" For each claim in each module: can we point to specific ingested content that supports it? For each framework in Module 4: is there transcript evidence? For each voice pattern in Module 2: can we find 3+ examples?

Prompt: "For each item in this module's extraction, find the specific source content that supports it. If you cannot find evidence, flag as 'unsupported'."

Ground truth pass

"Does the clone's output match how the expert would actually respond?" Use Module 8 scenario_applications as test cases. Feed scenario to clone using extractions as context. Compare clone response to expert's actual response. Score: voice match (Module 2), framework usage (Module 4), diagnostic accuracy (Module 7).

Integration: Calls expert-clone-scorer with test cases from expert-test-extractor.

4.6 Gate 2 → Gate 3 Comparison

When expert transitions from prospect (Gate 2) to signed (Gate 3):

Run full extraction on Gate 3 content (private docs included)
Per module, compare Gate 2 vs Gate 3 extraction: Matched (correctly inferred from public), Corrected (partially right, needed adjustment), Missed (entirely absent from public)
Store comparison in gate2_accuracy JSONB
Feed corrections back to improve public-only extraction patterns

Learning loop: Over time, builds a model of "what we can reliably extract from public content alone" vs "what requires private access." Each subsequent Gate 2 demo gets more accurate.

4.7 Integration with Existing Skills

Skill	Integration
expert-research	Runs BEFORE pipeline. Populates entity, discovers sources, triggers ingest. Output feeds Module 1 as context.
expert-doc-processor	PII scrubbing absorbed into ingest pipeline. Single-doc processing retained as utility called by Module 1-8 extractors.
expert-test-extractor	Runs AFTER Module 8. Takes scenario_applications, generates structured test cases (Q&A pairs with scoring criteria).
expert-clone-scorer	Runs DURING ground truth pass. Compares clone output to Module 8 ground truth. Returns per-dimension score.
expert-os-deployment	Runs AFTER validation complete. Takes validated extraction package → value ladder, validation page, onboarding, betaap.io deploy.

4.8 Extraction History Reconciliation

Expert	Issue	Action
Matt (SOUL/FLOW)	Complete from Jun 2024. Location unclear — Supabase, GDrive, or scattered across sessions.	Search GDrive. If found, import to expert_extractions table. If not, flag for re-extraction.
Bridger (SCALE/POWER)	In Supabase + GDrive. On hold (JV didn't finalize).	Locate existing chunks. Import to new schema. Tag as Gate 2 (no private docs received).
Brad (TIGER QUEST)	Rubric complete. Full extraction Nov 2025 — likely in Claude conversation text only.	Search Claude chat history. Export to GDrive. Import to schema. Tag as Gate 2.
Samuel (Align360)	In progress. Gate 2 public extraction underway. betaap.io v0 finalizing.	Active test case. Run automated pipeline on public content. Compare against manual extraction so far.

4.9 Meta-Index Builder

Problem: Extraction outputs can be 400KB+ per expert. Loading all into context window is wasteful.

Solution: Meta-index that points to sections rather than loading them:

{
  "expert": "Samuel / Align360",
  "modules_available": [1,2,3,4,5],
  "quick_reference": {
    "rubric_name": "ALIGN360_METHOD",
    "core_frameworks": ["Framework A","Framework B","Framework C"],
    "voice_summary": "Warm, direct, faith-integrated, story-driven",
    "primary_audience": "Christian professionals seeking alignment"
  },
  "retrieval_pointers": {
    "alignment_questions": "Load Module 4 → Framework B",
    "diagnostic_needed": "Load Module 7 → diagnostic_patterns",
    "content_generation": "Load Module 2 (voice) + Module 3 (CTA)"
  }
}

Clone loads meta-index first (~500 tokens). Full module content only when a retrieval pointer activates. Lazy-load pattern from :2hat applied to expert knowledge.

4.10 Acceptance Criteria

expert_extractions table created with all columns and constraints
expert_chunks table created with vector index
Module 1 extractor running end-to-end on test content (Samuel's public YouTube)
Module 1 validation gate enforced (pipeline stops if rubric fails)
Modules 2-8 running sequentially with Module 1 rubric check
Module 9 retrieval pattern builder producing routing rules
Forward pass validation running (module-to-module coherence)
Backward pass validation running (source tracing)
Ground truth pass running via expert-clone-scorer integration
Gate 2 → Gate 3 comparison logic implemented
Meta-index builder producing lazy-load reference per expert
At least one expert (Samuel) has complete automated public extraction
Extraction history for Matt/Bridger/Brad located and imported (or flagged)

↑ Master Registry ← PRD 1: Registry ← PRD 2: Gates ← PRD 3: Ingest

MASTERYMADE — PRD 4 of 12 — plan.jasondmacdonald.com

Dominia Facta. Build what compounds.