Security & Privacy Architecture - The Property Joes Group

I. Split Layers

Public content is crawlable and branded. Production systems are invisible and separated.

II. Hide Origin

No visible connection between public pages and the infrastructure that produces them.

III. Strip Fingerprint

No generator tags, no internal tool names, no shared tracking IDs linking pages as one operation.

IV. Protect the Engine

Never publish methodology, SOPs, prompts, or internal names. The moat is the system, not the content.

V. Honest Reality

Anything public is scrapable. Protect what competitors cannot replicate: the engine, the data freshness, the relationships.

I. Split Layers

Two layers with opposite rules.

PUBLIC (visibility) → crawlable, branded, AI-citation-friendly

PROPRIETARY (production) → invisible, separate infra, never referenced

Public: neighborhood guides, market stats, content pages — meant to be seen and cited
Proprietary: the AI production system, SOPs, prompts, internal org — never exposed
No crosslinks between the two layers

II. Hide Origin

CDN edge proxy + WHOIS privacy + infrastructure separation.

Request → CDN Edge → Response (origin IP never exposed)

Edge CDN proxies all requests — origin server IP hidden
WHOIS privacy on all domains — registrant info redacted
Production and public run on separated infrastructure
No server headers revealing technology stack

III. Strip Fingerprint

Eliminate any signal that clusters pages as one automated operation.

No generator meta → No tool names → No shared tracking → Template variation

Zero generator or AI-provenance meta tags
No internal tool/framework names in page source
Unique or absent tracking IDs per content cluster (avoids scaled-content-abuse signals)
Template variation across pages — no single detectable pattern

IV. Protect the Engine

The methodology is the moat. Pages are disposable surface area.

Internal (full) → Genericizer → Public (masked) → Verify clean

Dual-version architecture: internal (complete) + public (IP-masked)
Automated term genericization before any external share
SOPs, prompts, and production workflows never published
Internal page names, agent names, and system identifiers stripped

V. Honest Reality

Accept that public content can be copied. Build moats that cannot.

Content (copyable) vs Engine + Data + Relationships (non-replicable)

Content pages are treated as disposable surface — let competitors copy the output
The system that produces content faster and fresher is the real advantage
19 years of relationships and local expertise cannot be scraped
Data freshness (daily market stats, live CRM signals) decays any copied content

I. Split Layers

Public content on edge CDN (static HTML, no server-side rendering)
Production system on isolated VPS with SSH-only access
No DNS or subdomain linkage between layers
Separate credential sets per layer
Public pages reference only thepropertyjoes.com branding

II. Hide Origin

CDN edge proxy handles all HTTP; origin never responds directly
Server header: generic CDN identifier only (no technology stack)
WHOIS: registrant privacy service enabled
No CNAME or A record pointing to production IP
Headers stripped: X-Powered-By, Server version, Via

III. Strip Fingerprint

Pre-deploy grep: zero matches for generator, x-powered-by, internal terms
GA4 cluster risk: single tracking ID across all pages creates a clustering signal
Recommendation: segment tracking or remove from non-public operational pages
Template diversity: at least 3 distinct layout structures in rotation
Sitemap: exclude internal/operational pages from crawl directives

IV. Protect the Engine

Automated IP genericizer: 50+ proprietary terms mapped to safe equivalents
Verification pass: regex scan confirms zero leaked terms before deploy
Collaborated components excluded entirely from public versions
Structural IP (hierarchy tiers, subsystem groupings) abstracted, not just renamed
Internal pages carry noindex directive to prevent accidental indexing

V. Honest Reality

Public content optimized for AI citation (schema, structured data, llms.txt)
Speed moat: daily content refresh cycle outpaces any manual competitor
Relationship moat: 1,400+ transactions, 100+ referral partners, 19-year track record
Data moat: live CRM signals, market feeds, transaction history not available publicly
If a page is copied, the copy is immediately stale — freshness wins

Live Audit Findings

Real-time assessment of what our public pages currently expose. Conducted 2026-06-07.

Item	Status	Evidence	Fix
Generator / AI-provenance meta tags	Clean	Zero generator or AI-provenance meta tags found across sampled pages	None needed
X-Powered-By header	Clean	Response headers show only generic CDN server identifier. No X-Powered-By exposed	None needed
Internal term leakage (proprietary names)	Exposed	/software-library/ exposes "Primary node Status", "Core system Ledger", "Knowledge filter", "Advisory panel", "Build questionnaire". /agent-roster/ exposes "Primary node", "Secondary node" System names throughout. /ai-organization/ exposes "Content engine".	Add noindex + remove from sitemap; long-term: genericize or restrict access
Shared GA4 tracking ID (G-GZFTH9TSJW)	Risk	Same GA4 property on 197 pages (content + internal ops). Enables clustering all pages as one operation.	Remove GA4 from internal/ops pages; consider separate property for content vs ops
Origin / IP exposure	Clean	All responses proxied through CDN edge. No origin IP in headers. cf-ray confirms edge delivery.	None needed
WHOIS privacy	Unverified	whois command unavailable in environment; recommend manual check via registrar panel	Verify privacy guard is active on domain registrar
Sitemap: internal pages indexed	Exposed	sitemap.xml lists 128 URLs including: /agent-roster/, /ai-organization/, /software-library/, /core system-flow/, /build-brief/, /canopy-task-board/, /status/, /pipeline/, /content-approval/	Remove all internal/ops pages from sitemap.xml; add noindex meta to ops pages
robots.txt proprietary exposure	Clean	robots.txt allows all crawlers but does not list specific internal paths. No proprietary URLs exposed via Disallow.	Consider adding Disallow for /agent-roster/, /core system-*, /status/, etc.
llms.txt internal exposure	Clean	llms.txt contains only brand info, specializations, contact, and content topics. No internal system details.	None needed
_redirects file (legacy brand reference)	Minor	A legacy redirect path contains a deprecated internal brand name. While functional, the path itself is visible in deploy source.	Remove after sufficient time (301 cached); not urgent
Internal ops pages: noindex directive	Missing	/agent-roster/, /software-library/, /core system-flow/ have NO noindex meta tag. Search engines can index them freely.	Add <meta name="robots" content="noindex, nofollow"> to all ops pages

Top 3 Fixes (Ranked by Risk)

#1 Sitemap + Noindex for Internal Pages

Remove /agent-roster/, /software-library/, /core system-flow/, /build-brief/, /ai-organization/, /status/, /pipeline/, /canopy-task-board/, /content-approval/ from sitemap.xml. Add noindex meta to each. This is the highest risk because search engines are actively invited to index pages containing proprietary system names.

#2 GA4 Tracking Segmentation

Remove GA4 (G-GZFTH9TSJW) from all internal/ops pages. This single ID clusters 197 pages as one operation, making it trivial for competitors or platforms to identify the full scope of content production. Content pages keep GA4; ops pages get it removed.

#3 Proprietary Term Scrub on Indexed Pages

Pages currently in sitemap that contain "Primary node", "Secondary node", "Core system", "Advisory panel", "Content engine", "Knowledge filter", "Build questionnaire" must either be deindexed (fix #1) or have terms genericized via ip_genericizer.py. Fix #1 is faster; genericization is the long-term solution for any page that should remain public.