Research Methodology
The 411bz Research Methodology defines how the Structural Authority Score engine is validated, calibrated, and governed. Every scoring decision is tested against a CMS-diverse Probe Observatory, verified through SHA-256 determinism checks, and subjected to adversarial stress testing before deployment. Calibration follows strict 60-day windows with documented extraction patch governance.
Probe Observatory
The Probe Observatory is a curated dataset of 156+ websites used to validate SAS scoring accuracy, stability, and fairness across the full diversity of real-world web architectures. It is not a convenience sample. Every probe is selected to represent a distinct structural archetype.
8 CMS Types
WordPress, Shopify, Wix, Squarespace, Webflow, Edge SSR (Hono/Astro), Hugo, Custom HTML
Diversity Requirements
Probes span multiple verticals (professional services, e-commerce, SaaS, healthcare, legal, education, media), structural complexity levels (minimal single-page sites to enterprise multi-domain architectures), geographic regions, and language configurations. No single CMS type represents more than 25% of the observatory.
Adversarial Probes
20 adversarial probes are specifically designed to stress the scoring engine. These include sites with inflated schema (valid but excessive JSON-LD), FAQ spam (high-volume low-quality Q&A pairs), headless rendering (JavaScript-only content delivery), floor compression (sites engineered to appear minimal), and enterprise complexity (multi-brand, multi-domain architectures). Adversarial probes ensure the engine does not reward structural gaming.
Calibration Process — 5 Phases
Phase 1: Determinism Testing
Every probe site is scanned a minimum of 3 times. The scoring output must produce identical SHA-256 hashes across all runs. Stable JSON serialization with sorted keys eliminates object-order nondeterminism. Any hash mismatch triggers an immediate investigation.
Phase 2: Distribution Shape Analysis
SAS score distribution across the full probe set is monitored for compression (scores clustering too tightly), skew (disproportionate weight to one end), and bimodal artifacts (unexpected clustering around two distinct scores). A healthy distribution reflects genuine structural diversity without scoring-induced distortion.
Phase 3: Dimension Correlation Studies
No single dimension should explain more than 35% of total SAS variance. Correlation studies verify that dimensions remain independently informative. If two dimensions become highly correlated, the model risks redundancy.
Phase 4: Weight Sensitivity Simulation
Simulated weight perturbations confirm ranking stability. Small changes to individual dimension weights should not produce large, disproportionate changes to relative probe rankings. Sensitivity testing ensures the model is robust to minor calibration adjustments and does not exhibit cliff-edge behavior.
Phase 5: Adversarial Testing
The 20 adversarial probes are scored and evaluated against expected behavioral constraints. Schema inflation must not produce outsized SAS gains. FAQ spam must not dominate scoring. No adversarial probe may score in the top 10% of the observatory without genuine structural merit.
60-Day Calibration Windows
SAS scoring weights are locked during 60-day calibration windows. This ensures scoring stability for all entities being measured and prevents reactive adjustments that could undermine trust in the measurement system.
- During the window: Only extraction bug fixes are permitted. No weight changes, no threshold adjustments, no new dimension introductions.
- After the window: Calibration adjustments are made based on accumulated statistical evidence from distribution analysis, adversarial testing results, and dimension correlation studies.
- Re-baseline requirement: Every calibration change triggers a full re-baseline of the entire probe dataset with pre-and-post distribution comparison documented.
Extraction Patch Policy
When a scoring anomaly is identified, an extraction patch may be deployed to correct the extraction logic. Patches are not applied reactively. Four strict conditions must be satisfied:
- Reproducibility: The issue must be reproducible across 3 or more probe sites. Single-site anomalies do not warrant extraction patches.
- HTML-Verifiable Signal: The signal causing the anomaly must be directly verifiable in the HTML source.
- Post-Patch Distribution Stability: After applying the patch to the full probe dataset, the overall SAS distribution must remain stable.
- Determinism Maintained: SHA-256 hash verification must pass on all probe sites after the patch is applied.
CMS Bias Detection
The scoring engine must not systematically favor or penalize any CMS platform. CMS bias detection is a continuous validation process applied during every calibration cycle.
- Per-CMS distribution analysis: SAS distributions are computed separately for each CMS type. Statistically significant differences in mean or variance between CMS groups trigger investigation.
- Extraction parity testing: Identical structural signals implemented across different CMS platforms must produce equivalent SAS scores within tolerance.
- Platform-specific artifacts: CMS-generated boilerplate HTML, framework-injected metadata, and platform-specific DOM patterns are identified and excluded from scoring where they do not represent deliberate structural decisions.
Frequently Asked Questions
What is the Probe Observatory?
A curated dataset of 156+ websites across 8 CMS types used to validate SAS scoring robustness. Probes span multiple verticals, structural complexity levels, and geographic regions to ensure accurate, unbiased scoring.
How does 411bz ensure SAS is deterministic?
Determinism is verified through SHA-256 hash comparison. Every probe site is scanned a minimum of 3 times, and the scoring output must produce identical hashes across all runs. Stable JSON serialization with sorted keys eliminates object-order nondeterminism.
What conditions must be met before an extraction patch is deployed?
Four conditions: the issue must be reproducible across 3+ probe sites, the signal must be verifiable in HTML source, post-patch distribution must remain stable, and determinism must be maintained with SHA-256 verification passing on all probes.
Test the Engine
Run a free Structural Authority Score scan to see the methodology in action.
Run Free SAS Scan