Protecting the engine, not the page
Public content is crawlable and branded. Production systems are invisible and separated.
No visible connection between public pages and the infrastructure that produces them.
No generator tags, no internal tool names, no shared tracking IDs linking pages as one operation.
Never publish methodology, SOPs, prompts, or internal names. The moat is the system, not the content.
Anything public is scrapable. Protect what competitors cannot replicate: the engine, the data freshness, the relationships.
Two layers with opposite rules.
PUBLIC (visibility) → crawlable, branded, AI-citation-friendly
PROPRIETARY (production) → invisible, separate infra, never referenced
CDN edge proxy + WHOIS privacy + infrastructure separation.
Request → CDN Edge → Response (origin IP never exposed)
Eliminate any signal that clusters pages as one automated operation.
No generator meta → No tool names → No shared tracking → Template variation
The methodology is the moat. Pages are disposable surface area.
Internal (full) → Genericizer → Public (masked) → Verify clean
Accept that public content can be copied. Build moats that cannot.
Content (copyable) vs Engine + Data + Relationships (non-replicable)
Real-time assessment of what our public pages currently expose. Conducted 2026-06-07.
| Item | Status | Evidence | Fix |
|---|---|---|---|
| Generator / AI-provenance meta tags | Clean | Zero generator or AI-provenance meta tags found across sampled pages | None needed |
| X-Powered-By header | Clean | Response headers show only generic CDN server identifier. No X-Powered-By exposed | None needed |
| Internal term leakage (proprietary names) | Exposed | /software-library/ exposes "Primary node Status", "Core system Ledger", "Knowledge filter", "Advisory panel", "Build questionnaire". /agent-roster/ exposes "Primary node", "Secondary node" System names throughout. /ai-organization/ exposes "Content engine". | Add noindex + remove from sitemap; long-term: genericize or restrict access |
| Shared GA4 tracking ID (G-GZFTH9TSJW) | Risk | Same GA4 property on 197 pages (content + internal ops). Enables clustering all pages as one operation. | Remove GA4 from internal/ops pages; consider separate property for content vs ops |
| Origin / IP exposure | Clean | All responses proxied through CDN edge. No origin IP in headers. cf-ray confirms edge delivery. | None needed |
| WHOIS privacy | Unverified | whois command unavailable in environment; recommend manual check via registrar panel | Verify privacy guard is active on domain registrar |
| Sitemap: internal pages indexed | Exposed | sitemap.xml lists 128 URLs including: /agent-roster/, /ai-organization/, /software-library/, /core system-flow/, /build-brief/, /canopy-task-board/, /status/, /pipeline/, /content-approval/ | Remove all internal/ops pages from sitemap.xml; add noindex meta to ops pages |
| robots.txt proprietary exposure | Clean | robots.txt allows all crawlers but does not list specific internal paths. No proprietary URLs exposed via Disallow. | Consider adding Disallow for /agent-roster/, /core system-*, /status/, etc. |
| llms.txt internal exposure | Clean | llms.txt contains only brand info, specializations, contact, and content topics. No internal system details. | None needed |
| _redirects file (legacy brand reference) | Minor | A legacy redirect path contains a deprecated internal brand name. While functional, the path itself is visible in deploy source. | Remove after sufficient time (301 cached); not urgent |
| Internal ops pages: noindex directive | Missing | /agent-roster/, /software-library/, /core system-flow/ have NO noindex meta tag. Search engines can index them freely. | Add <meta name="robots" content="noindex, nofollow"> to all ops pages |
Remove /agent-roster/, /software-library/, /core system-flow/, /build-brief/, /ai-organization/, /status/, /pipeline/, /canopy-task-board/, /content-approval/ from sitemap.xml. Add noindex meta to each. This is the highest risk because search engines are actively invited to index pages containing proprietary system names.
Remove GA4 (G-GZFTH9TSJW) from all internal/ops pages. This single ID clusters 197 pages as one operation, making it trivial for competitors or platforms to identify the full scope of content production. Content pages keep GA4; ops pages get it removed.
Pages currently in sitemap that contain "Primary node", "Secondary node", "Core system", "Advisory panel", "Content engine", "Knowledge filter", "Build questionnaire" must either be deindexed (fix #1) or have terms genericized via ip_genericizer.py. Fix #1 is faster; genericization is the long-term solution for any page that should remain public.