Security & Privacy Architecture

Protecting the engine, not the page

I. Split Layers II. Hide Origin III. Strip Fingerprint IV. Protect Engine V. Honest Reality
Canopy
Understory
Root Level
I. Split Layers

Public content is crawlable and branded. Production systems are invisible and separated.

II. Hide Origin

No visible connection between public pages and the infrastructure that produces them.

III. Strip Fingerprint

No generator tags, no internal tool names, no shared tracking IDs linking pages as one operation.

IV. Protect the Engine

Never publish methodology, SOPs, prompts, or internal names. The moat is the system, not the content.

V. Honest Reality

Anything public is scrapable. Protect what competitors cannot replicate: the engine, the data freshness, the relationships.

I. Split Layers

Two layers with opposite rules.

PUBLIC (visibility) → crawlable, branded, AI-citation-friendly

PROPRIETARY (production) → invisible, separate infra, never referenced

  • Public: neighborhood guides, market stats, content pages — meant to be seen and cited
  • Proprietary: the AI production system, SOPs, prompts, internal org — never exposed
  • No crosslinks between the two layers
II. Hide Origin

CDN edge proxy + WHOIS privacy + infrastructure separation.

Request → CDN Edge → Response (origin IP never exposed)

  • Edge CDN proxies all requests — origin server IP hidden
  • WHOIS privacy on all domains — registrant info redacted
  • Production and public run on separated infrastructure
  • No server headers revealing technology stack
III. Strip Fingerprint

Eliminate any signal that clusters pages as one automated operation.

No generator meta → No tool names → No shared tracking → Template variation

  • Zero generator or AI-provenance meta tags
  • No internal tool/framework names in page source
  • Unique or absent tracking IDs per content cluster (avoids scaled-content-abuse signals)
  • Template variation across pages — no single detectable pattern
IV. Protect the Engine

The methodology is the moat. Pages are disposable surface area.

Internal (full) → Genericizer → Public (masked) → Verify clean

  • Dual-version architecture: internal (complete) + public (IP-masked)
  • Automated term genericization before any external share
  • SOPs, prompts, and production workflows never published
  • Internal page names, agent names, and system identifiers stripped
V. Honest Reality

Accept that public content can be copied. Build moats that cannot.

Content (copyable) vs Engine + Data + Relationships (non-replicable)

  • Content pages are treated as disposable surface — let competitors copy the output
  • The system that produces content faster and fresher is the real advantage
  • 19 years of relationships and local expertise cannot be scraped
  • Data freshness (daily market stats, live CRM signals) decays any copied content
I. Split Layers
  • Public content on edge CDN (static HTML, no server-side rendering)
  • Production system on isolated VPS with SSH-only access
  • No DNS or subdomain linkage between layers
  • Separate credential sets per layer
  • Public pages reference only thepropertyjoes.com branding
II. Hide Origin
  • CDN edge proxy handles all HTTP; origin never responds directly
  • Server header: generic CDN identifier only (no technology stack)
  • WHOIS: registrant privacy service enabled
  • No CNAME or A record pointing to production IP
  • Headers stripped: X-Powered-By, Server version, Via
III. Strip Fingerprint
  • Pre-deploy grep: zero matches for generator, x-powered-by, internal terms
  • GA4 cluster risk: single tracking ID across all pages creates a clustering signal
  • Recommendation: segment tracking or remove from non-public operational pages
  • Template diversity: at least 3 distinct layout structures in rotation
  • Sitemap: exclude internal/operational pages from crawl directives
IV. Protect the Engine
  • Automated IP genericizer: 50+ proprietary terms mapped to safe equivalents
  • Verification pass: regex scan confirms zero leaked terms before deploy
  • Collaborated components excluded entirely from public versions
  • Structural IP (hierarchy tiers, subsystem groupings) abstracted, not just renamed
  • Internal pages carry noindex directive to prevent accidental indexing
V. Honest Reality
  • Public content optimized for AI citation (schema, structured data, llms.txt)
  • Speed moat: daily content refresh cycle outpaces any manual competitor
  • Relationship moat: 1,400+ transactions, 100+ referral partners, 19-year track record
  • Data moat: live CRM signals, market feeds, transaction history not available publicly
  • If a page is copied, the copy is immediately stale — freshness wins

Live Audit Findings

Real-time assessment of what our public pages currently expose. Conducted 2026-06-07.

Item Status Evidence Fix
Generator / AI-provenance meta tags Clean Zero generator or AI-provenance meta tags found across sampled pages None needed
X-Powered-By header Clean Response headers show only generic CDN server identifier. No X-Powered-By exposed None needed
Internal term leakage (proprietary names) Exposed /software-library/ exposes "Primary node Status", "Core system Ledger", "Knowledge filter", "Advisory panel", "Build questionnaire". /agent-roster/ exposes "Primary node", "Secondary node" System names throughout. /ai-organization/ exposes "Content engine". Add noindex + remove from sitemap; long-term: genericize or restrict access
Shared GA4 tracking ID (G-GZFTH9TSJW) Risk Same GA4 property on 197 pages (content + internal ops). Enables clustering all pages as one operation. Remove GA4 from internal/ops pages; consider separate property for content vs ops
Origin / IP exposure Clean All responses proxied through CDN edge. No origin IP in headers. cf-ray confirms edge delivery. None needed
WHOIS privacy Unverified whois command unavailable in environment; recommend manual check via registrar panel Verify privacy guard is active on domain registrar
Sitemap: internal pages indexed Exposed sitemap.xml lists 128 URLs including: /agent-roster/, /ai-organization/, /software-library/, /core system-flow/, /build-brief/, /canopy-task-board/, /status/, /pipeline/, /content-approval/ Remove all internal/ops pages from sitemap.xml; add noindex meta to ops pages
robots.txt proprietary exposure Clean robots.txt allows all crawlers but does not list specific internal paths. No proprietary URLs exposed via Disallow. Consider adding Disallow for /agent-roster/, /core system-*, /status/, etc.
llms.txt internal exposure Clean llms.txt contains only brand info, specializations, contact, and content topics. No internal system details. None needed
_redirects file (legacy brand reference) Minor A legacy redirect path contains a deprecated internal brand name. While functional, the path itself is visible in deploy source. Remove after sufficient time (301 cached); not urgent
Internal ops pages: noindex directive Missing /agent-roster/, /software-library/, /core system-flow/ have NO noindex meta tag. Search engines can index them freely. Add <meta name="robots" content="noindex, nofollow"> to all ops pages

Top 3 Fixes (Ranked by Risk)

#1 Sitemap + Noindex for Internal Pages

Remove /agent-roster/, /software-library/, /core system-flow/, /build-brief/, /ai-organization/, /status/, /pipeline/, /canopy-task-board/, /content-approval/ from sitemap.xml. Add noindex meta to each. This is the highest risk because search engines are actively invited to index pages containing proprietary system names.

#2 GA4 Tracking Segmentation

Remove GA4 (G-GZFTH9TSJW) from all internal/ops pages. This single ID clusters 197 pages as one operation, making it trivial for competitors or platforms to identify the full scope of content production. Content pages keep GA4; ops pages get it removed.

#3 Proprietary Term Scrub on Indexed Pages

Pages currently in sitemap that contain "Primary node", "Secondary node", "Core system", "Advisory panel", "Content engine", "Knowledge filter", "Build questionnaire" must either be deindexed (fix #1) or have terms genericized via ip_genericizer.py. Fix #1 is faster; genericization is the long-term solution for any page that should remain public.

📚Library