Dual Database Strategy
Store Shield uses two separate databases for different purposes. This document explains the reasoning and implementation.
The Two Databases
| Database | Technology | Location | Purpose |
|---|---|---|---|
| D1 | SQLite (Cloudflare) | Edge (global) | Real-time events, sessions, analytics |
| Prisma | SQLite | Fly.io | Scans, billing, settings, complex data |
Why Two Databases?
Problem: Conflicting Requirements
Real-time protection and background scanning have very different database needs:
| Requirement | Real-time Protection | Background Scanning |
|---|---|---|
| Latency | Under 50ms critical | Seconds acceptable |
| Write volume | Thousands/second | Hundreds/day |
| Read patterns | Simple lookups | Complex queries |
| Data relationships | Flat, denormalized | Rich relationships |
| Location | Edge (global) | Central |
Solution: Right Tool for Each Job
D1 (Cloudflare Edge)
- Handles all pixel ingestion at the edge
- Near-zero latency for protection checks
- Automatic aggregation for analytics
- No central server bottleneck
Prisma (Fly.io)
- Handles complex business logic
- Rich relationships (scans → results → threats)
- Transaction support for billing
- Full Prisma tooling (migrations, type safety)
D1 Schema Overview
D1 stores real-time, high-volume data:
-- Raw event tables (write-heavy)
VisitorIdentity -- Fingerprinted visitors
SessionSnapshot -- Session tracking
BotSignal -- Bot detection events
SpySignal -- Spy tool detections
IPBlockingEvent -- IP/Country blocks
ProtectionEvent -- Content protection triggers
-- Aggregate tables (updated on each event)
DailyMetrics -- Counts per shop/date
TopIPsDaily -- Top threat IPs
TopPagesDaily -- Most targeted pages
TopCountriesDaily -- Threat by country
-- Config cache
MerchantConfig -- Protection settings
IPBlockingConfig -- Block rules
Aggregation Strategy
Instead of querying raw events for analytics, we update aggregates on every write:
// On each bot detection event:
await db.prepare(`
INSERT INTO DailyMetrics (shop, date, bot_events)
VALUES (?, ?, 1)
ON CONFLICT (shop, date)
DO UPDATE SET bot_events = bot_events + 1
`).bind(shop, today).run();
This trades write complexity for read simplicity:
- Write: Slightly more expensive (UPDATE after INSERT)
- Read: Instant aggregates, no scanning millions of rows
Prisma Schema Overview
Prisma handles complex, relationship-rich data:
// Billing and settings
model MerchantSettings {
id String @id
shop String @unique
plan String
billingStatus String
// ... feature flags, injection method, etc.
}
// Scan jobs with complex state
model ProductScanJob {
id String @id
shop String
status String // pending, processing, completed, failed
products ScannedProduct[]
phishingResults PhishingDomain[]
marketplaceResults MarketplaceThreat[]
}
// Rich threat data with relationships
model MarketplaceThreat {
id String @id
productScanId String
productScan ProductScanJob @relation(...)
platform String // amazon, ebay, etc.
matchScore Float
// ... threat details
}
Data Synchronization
Some data exists in both databases:
IP Blocking Config
Source of truth: Prisma
Synced to: D1 + KV
Remix App → Prisma (save) → POST /api/sync-ip-blocking-config → D1 + KV
The edge needs fast access to blocking rules, so we sync on every change.
Protection Settings
Source of truth: Prisma (via MerchantSettings)
Synced to: D1 MerchantConfig
Theme Extension → POST /api/heartbeat → D1 MerchantConfig (updated)
The heartbeat carries current settings; D1 caches them for fast lookups.
Querying Across Databases
The Remix app queries both databases and combines results:
// app/db/analytics.server.ts
export async function getThreatSummary(shop: string) {
// Real-time data from D1 (via Worker API)
const realTimeStats = await fetch(`${EVENTS_WORKER_URL}/api/analytics/summary?shop=${shop}`);
// Historical scan data from Prisma
const scanStats = await prisma.productScanJob.aggregate({
where: { shop },
_count: { id: true }
});
const phishingCount = await prisma.phishingDomain.count({
where: { shop, status: 'active' }
});
return {
...realTimeStats,
totalScans: scanStats._count.id,
activePhishing: phishingCount
};
}
Trade-offs
Advantages
- Performance: Edge database handles real-time load
- Cost: No server compute for high-volume pixel processing
- Scalability: Each database scales independently
- Flexibility: Right tool for each job
Disadvantages
- Complexity: Two schemas to maintain
- Sync overhead: Some data must be kept in sync
- Eventual consistency: Edge data may lag slightly
- Debugging: Issues can span both databases
Best Practices
- Clear ownership: Each data type has one source of truth
- Minimal sync: Only sync what the edge needs
- Idempotent syncs: Re-syncing should be safe
- Monitor both: Alerts for both D1 and Prisma issues