Dual Database Strategy

Store Shield uses two separate databases for different purposes. This document explains the reasoning and implementation.

The Two Databases

Database	Technology	Location	Purpose
D1	SQLite (Cloudflare)	Edge (global)	Real-time events, sessions, analytics
Prisma	SQLite	Fly.io	Scans, billing, settings, complex data

Why Two Databases?

Problem: Conflicting Requirements

Real-time protection and background scanning have very different database needs:

Requirement	Real-time Protection	Background Scanning
Latency	Under 50ms critical	Seconds acceptable
Write volume	Thousands/second	Hundreds/day
Read patterns	Simple lookups	Complex queries
Data relationships	Flat, denormalized	Rich relationships
Location	Edge (global)	Central

Solution: Right Tool for Each Job

D1 (Cloudflare Edge)

Handles all pixel ingestion at the edge
Near-zero latency for protection checks
Automatic aggregation for analytics
No central server bottleneck

Prisma (Fly.io)

Handles complex business logic
Rich relationships (scans → results → threats)
Transaction support for billing
Full Prisma tooling (migrations, type safety)

D1 Schema Overview

D1 stores real-time, high-volume data:

-- Raw event tables (write-heavy)
VisitorIdentity    -- Fingerprinted visitors
SessionSnapshot    -- Session tracking
BotSignal          -- Bot detection events
SpySignal          -- Spy tool detections
IPBlockingEvent    -- IP/Country blocks
ProtectionEvent    -- Content protection triggers

-- Aggregate tables (updated on each event)
DailyMetrics       -- Counts per shop/date
TopIPsDaily        -- Top threat IPs
TopPagesDaily      -- Most targeted pages
TopCountriesDaily  -- Threat by country

-- Config cache
MerchantConfig     -- Protection settings
IPBlockingConfig   -- Block rules

Aggregation Strategy

Instead of querying raw events for analytics, we update aggregates on every write:

// On each bot detection event:
await db.prepare(`
  INSERT INTO DailyMetrics (shop, date, bot_events)
  VALUES (?, ?, 1)
  ON CONFLICT (shop, date)
  DO UPDATE SET bot_events = bot_events + 1
`).bind(shop, today).run();

This trades write complexity for read simplicity:

Write: Slightly more expensive (UPDATE after INSERT)
Read: Instant aggregates, no scanning millions of rows

Prisma Schema Overview

Prisma handles complex, relationship-rich data:

// Billing and settings
model MerchantSettings {
  id                    String   @id
  shop                  String   @unique
  plan                  String
  billingStatus         String
  // ... feature flags, injection method, etc.
}

// Scan jobs with complex state
model ProductScanJob {
  id              String    @id
  shop            String
  status          String    // pending, processing, completed, failed
  products        ScannedProduct[]
  phishingResults PhishingDomain[]
  marketplaceResults MarketplaceThreat[]
}

// Rich threat data with relationships
model MarketplaceThreat {
  id            String   @id
  productScanId String
  productScan   ProductScanJob @relation(...)
  platform      String   // amazon, ebay, etc.
  matchScore    Float
  // ... threat details
}

Data Synchronization

Some data exists in both databases:

IP Blocking Config

Source of truth: Prisma
Synced to: D1 + KV

Remix App → Prisma (save) → POST /api/sync-ip-blocking-config → D1 + KV

The edge needs fast access to blocking rules, so we sync on every change.

Protection Settings

Source of truth: Prisma (via MerchantSettings)
Synced to: D1 MerchantConfig

Theme Extension → POST /api/heartbeat → D1 MerchantConfig (updated)

The heartbeat carries current settings; D1 caches them for fast lookups.

Querying Across Databases

The Remix app queries both databases and combines results:

// app/db/analytics.server.ts
export async function getThreatSummary(shop: string) {
  // Real-time data from D1 (via Worker API)
  const realTimeStats = await fetch(`${EVENTS_WORKER_URL}/api/analytics/summary?shop=${shop}`);
  
  // Historical scan data from Prisma
  const scanStats = await prisma.productScanJob.aggregate({
    where: { shop },
    _count: { id: true }
  });
  
  const phishingCount = await prisma.phishingDomain.count({
    where: { shop, status: 'active' }
  });
  
  return {
    ...realTimeStats,
    totalScans: scanStats._count.id,
    activePhishing: phishingCount
  };
}

Trade-offs

Advantages

Performance: Edge database handles real-time load
Cost: No server compute for high-volume pixel processing
Scalability: Each database scales independently
Flexibility: Right tool for each job

Disadvantages

Complexity: Two schemas to maintain
Sync overhead: Some data must be kept in sync
Eventual consistency: Edge data may lag slightly
Debugging: Issues can span both databases

Best Practices

Clear ownership: Each data type has one source of truth
Minimal sync: Only sync what the edge needs
Idempotent syncs: Re-syncing should be safe
Monitor both: Alerts for both D1 and Prisma issues

The Two Databases​

Why Two Databases?​

Problem: Conflicting Requirements​

Solution: Right Tool for Each Job​

D1 Schema Overview​

Aggregation Strategy​

Prisma Schema Overview​

Data Synchronization​

IP Blocking Config​

Protection Settings​

Querying Across Databases​

Trade-offs​

Advantages​

Disadvantages​

Best Practices​