Skip to main content

Dual Database Strategy

Store Shield uses two separate databases for different purposes. This document explains the reasoning and implementation.

The Two Databases

DatabaseTechnologyLocationPurpose
D1SQLite (Cloudflare)Edge (global)Real-time events, sessions, analytics
PrismaSQLiteFly.ioScans, billing, settings, complex data

Why Two Databases?

Problem: Conflicting Requirements

Real-time protection and background scanning have very different database needs:

RequirementReal-time ProtectionBackground Scanning
LatencyUnder 50ms criticalSeconds acceptable
Write volumeThousands/secondHundreds/day
Read patternsSimple lookupsComplex queries
Data relationshipsFlat, denormalizedRich relationships
LocationEdge (global)Central

Solution: Right Tool for Each Job

D1 (Cloudflare Edge)

  • Handles all pixel ingestion at the edge
  • Near-zero latency for protection checks
  • Automatic aggregation for analytics
  • No central server bottleneck

Prisma (Fly.io)

  • Handles complex business logic
  • Rich relationships (scans → results → threats)
  • Transaction support for billing
  • Full Prisma tooling (migrations, type safety)

D1 Schema Overview

D1 stores real-time, high-volume data:

-- Raw event tables (write-heavy)
VisitorIdentity -- Fingerprinted visitors
SessionSnapshot -- Session tracking
BotSignal -- Bot detection events
SpySignal -- Spy tool detections
IPBlockingEvent -- IP/Country blocks
ProtectionEvent -- Content protection triggers

-- Aggregate tables (updated on each event)
DailyMetrics -- Counts per shop/date
TopIPsDaily -- Top threat IPs
TopPagesDaily -- Most targeted pages
TopCountriesDaily -- Threat by country

-- Config cache
MerchantConfig -- Protection settings
IPBlockingConfig -- Block rules

Aggregation Strategy

Instead of querying raw events for analytics, we update aggregates on every write:

// On each bot detection event:
await db.prepare(`
INSERT INTO DailyMetrics (shop, date, bot_events)
VALUES (?, ?, 1)
ON CONFLICT (shop, date)
DO UPDATE SET bot_events = bot_events + 1
`).bind(shop, today).run();

This trades write complexity for read simplicity:

  • Write: Slightly more expensive (UPDATE after INSERT)
  • Read: Instant aggregates, no scanning millions of rows

Prisma Schema Overview

Prisma handles complex, relationship-rich data:

// Billing and settings
model MerchantSettings {
id String @id
shop String @unique
plan String
billingStatus String
// ... feature flags, injection method, etc.
}

// Scan jobs with complex state
model ProductScanJob {
id String @id
shop String
status String // pending, processing, completed, failed
products ScannedProduct[]
phishingResults PhishingDomain[]
marketplaceResults MarketplaceThreat[]
}

// Rich threat data with relationships
model MarketplaceThreat {
id String @id
productScanId String
productScan ProductScanJob @relation(...)
platform String // amazon, ebay, etc.
matchScore Float
// ... threat details
}

Data Synchronization

Some data exists in both databases:

IP Blocking Config

Source of truth: Prisma
Synced to: D1 + KV

Remix App → Prisma (save) → POST /api/sync-ip-blocking-config → D1 + KV

The edge needs fast access to blocking rules, so we sync on every change.

Protection Settings

Source of truth: Prisma (via MerchantSettings)
Synced to: D1 MerchantConfig

Theme Extension → POST /api/heartbeat → D1 MerchantConfig (updated)

The heartbeat carries current settings; D1 caches them for fast lookups.

Querying Across Databases

The Remix app queries both databases and combines results:

// app/db/analytics.server.ts
export async function getThreatSummary(shop: string) {
// Real-time data from D1 (via Worker API)
const realTimeStats = await fetch(`${EVENTS_WORKER_URL}/api/analytics/summary?shop=${shop}`);

// Historical scan data from Prisma
const scanStats = await prisma.productScanJob.aggregate({
where: { shop },
_count: { id: true }
});

const phishingCount = await prisma.phishingDomain.count({
where: { shop, status: 'active' }
});

return {
...realTimeStats,
totalScans: scanStats._count.id,
activePhishing: phishingCount
};
}

Trade-offs

Advantages

  1. Performance: Edge database handles real-time load
  2. Cost: No server compute for high-volume pixel processing
  3. Scalability: Each database scales independently
  4. Flexibility: Right tool for each job

Disadvantages

  1. Complexity: Two schemas to maintain
  2. Sync overhead: Some data must be kept in sync
  3. Eventual consistency: Edge data may lag slightly
  4. Debugging: Issues can span both databases

Best Practices

  1. Clear ownership: Each data type has one source of truth
  2. Minimal sync: Only sync what the edge needs
  3. Idempotent syncs: Re-syncing should be safe
  4. Monitor both: Alerts for both D1 and Prisma issues