Author: System Documentation | Last Updated: December 19, 2025 | Version: 1.0
Table of Contents
-
System Overview
-
Single File Delete Operation
-
Bulk File Delete Operation
-
Architecture Comparison
-
Error Handling & Recovery
-
Performance Characteristics
-
Known Issues & Limitations
System Overview
Core Principles
-
Zero Data Loss: All files backed up to R2 before deletion from Shopify
-
Atomic Operations: Database transactions ensure consistency
-
30-Day Recovery: Files recoverable from trash for 30 days
-
Audit Trail: All deletions logged with user context
Technology Stack
- Storage: Cloudflare R2 (S3-compatible)
- Database: SQLite via Prisma
- Queue: Database-backed job queue
- Worker: Single Node.js process
- Hosting: Fly.io (1GB RAM, shared CPU)
Single File Delete Operation
Use Case
- User deletes 1-5 files from UI
- Files < 200MB
- Requires immediate feedback
- Good network conditions
┌─────────────────────────────────────────────────────┐
│ HTTP REQUEST │
│ (20-minute timeout limit) │
├─────────────────────────────────────────────────────┤
│ │
│ User clicks Delete │
│ ↓ │
│ Route handler (app.files.tsx) │
│ ↓ │
│ deleteFileToTrash() - app/lib/delete-service.ts │
│ ↓ │
│ ┌──────────────────────────────────────────┐ │
│ │ Step 1: Fetch from Shopify CDN │ │
│ │ Step 2: Upload to R2 (backup) │ │
│ │ Step 3: Database transaction │ │
│ │ - Create deletedFile record │ │
│ │ - Delete mediaFile record │ │
│ │ - Recalculate stats │ │
│ │ Step 4: Delete from Shopify │ │
│ │ Step 5: Log audit entry │ │
│ └──────────────────────────────────────────┘ │
│ ↓ │
│ Return success/failure to UI │
│ │
└─────────────────────────────────────────────────────┘
Implementation Details
// Location: app/routes/app.files.tsx
export const action = async ({ request }: ActionFunctionArgs) => {
const { admin, session } = await authenticate.admin(request);
const formData = await request.formData();
const intent = formData.get("intent");
const fileId = formData.get("fileId") as string;
if (intent === "delete") {
const userInfo = getUserInfo(session);
const result = await deleteFileToTrash(
session.shop,
fileId,
admin,
userInfo.email,
userInfo.name
);
return json(result);
}
};
// Location: app/lib/delete-service.server.ts
export async function deleteFileToTrash(
shop: string,
fileId: string,
admin: AdminApiContext,
userEmail?: string | null,
userName?: string | null
): Promise<{ success: boolean; message: string }>
Step-by-Step Execution
Step 1: Fetch File Metadata & Stream from Shopify
// Query database for file info
const file = await prisma.mediaFile.findUnique({
where: { id: fileId, shop },
});
// Fetch usage details (which products/collections use this)
const usageDetails = await fetchUsageDetails(file, admin, shop);
// Stream file from Shopify CDN
const response = await fetch(file.url);
const bodyStream = response.body;
Timeout Risk: 2-60 seconds depending on file size and network
Step 2: Backup to R2
const storageKey = `${shop}/${fileId}/${file.filename}`;
const upload = new Upload({
client: r2Client,
params: {
Bucket: R2_BUCKET_NAME,
Key: storageKey,
Body: bodyStream,
ContentType: file.mimeType,
},
queueSize: 8, // 8 parallel chunk uploads
partSize: 10 * 1024 * 1024, // 10MB chunks
leavePartsOnError: false,
});
await upload.done();
const storageUrl = `r2://${R2_BUCKET_NAME}/${storageKey}`;
const previewUrl = getPublicR2Url(storageKey);
Features:
- Streaming upload (no memory buffering)
- 10MB multipart chunks
- Supports files up to 5GB
- Automatic retry on chunk failure
Timeout Risk: 2-300 seconds for large files
Step 3: Atomic Database Transaction
await prisma.$transaction(async (tx) => {
// Create backup record
await tx.deletedFile.create({
data: {
shop,
originalFileId: fileId,
filename: file.filename,
originalUrl: file.url,
storageUrl,
storageKey,
previewUrl,
size: file.size,
usageDetails: JSON.stringify(usageDetails),
restoreBefore: new Date(Date.now() + 30 * 24 * 60 * 60 * 1000), // 30 days
wasUsed: file.isUsed,
// ... more fields
}
});
// Delete from active files
await tx.mediaFile.delete({
where: { id: fileId },
});
// Recalculate shop statistics
await recalculateStats(shop, tx);
}, {
maxWait: 30000, // 30 seconds max wait for transaction lock
timeout: 30000, // 30 seconds execution timeout
});
Transaction guarantees:
- All-or-nothing execution (ACID)
- 30-second timeout limit
- Automatic rollback on failure
Step 4: Delete from Shopify
await deleteFileFromShopify(fileId, admin);
Important: This happens AFTER the DB transaction. If this fails:
- File is marked deleted in our DB
- Backup exists in R2
- File still exists in Shopify (can be manually retried)
Step 5: Audit Logging
await logAudit({
shop,
userEmail,
userName,
action: AuditAction.FILE_DELETED,
targetName: file.filename,
details: `Deleted ${file.filename} (20MB) • Used in: 5 products, 2 collections`,
metadata: {
fileId: file.id,
size: Number(file.size),
wasUsed: file.isUsed,
usageCount: usageDetails.totalUsages,
},
success: true,
});
Timeout Scenarios
File Size | Network | Expected Time | Timeout Risk |
|---|---|---|---|
< 50MB | Good | 5-20 sec | Very Low (< 1%) |
50-200MB | Good | 20-60 sec | Low (< 5%) |
200-500MB | Good | 60-180 sec | Medium (~10%) |
> 500MB | Good | 180-1200 sec | High (> 30%) |
Any | Poor | Variable | High |
HTTP Timeout: 20 minutes (Fly.io grace_period)
Failure Recovery
Scenario 1: Timeout during R2 upload
- R2 file may be partially uploaded
- Database unchanged
- Shopify file intact
- Recovery: Retry operation (idempotent)
Scenario 2: Database transaction timeout
- R2 file successfully uploaded
- Transaction rolled back
- Shopify file intact
- Recovery: Orphaned R2 file (storage waste), retry operation
Scenario 3: Timeout after DB, before Shopify delete
- R2 backup exists ✅
- Database updated ✅
- Shopify file still exists ❌
- Recovery: File appears deleted in app but still in Shopify (manual cleanup needed)
Bulk File Delete Operation
Use Case
- User deletes 10+ files
- Files of any size
- Long-running operation (minutes to hours)
- Resilient to restarts/failures
Architecture
┌─────────────────────────────────────────────────────┐
│ HTTP REQUEST (instant) │
│ Creates job, returns immediately │
└─────────────────────────────────────────────────────┘
│
├─ Creates BulkJob record (PENDING)
├─ Creates 50 BulkJobItem records (PENDING)
└─ Returns job ID to user (~200ms)
┌─────────────────────────────────────────────────────┐
│ BACKGROUND WORKER (no timeout) │
│ Separate Node.js process │
└─────────────────────────────────────────────────────┘
│
├─ Polls database every 5 seconds
├─ Picks oldest PENDING job (FIFO)
├─ Marks job as PROCESSING
│
└─ For each file:
├─ Mark item PROCESSING
├─ Call deleteFileToTrash()
├─ Mark item SUCCESS/FAILED
├─ Update job progress
└─ Repeat until all done
├─ Mark job COMPLETED
└─ Move to next job
Database Schema
BulkJob Table
model BulkJob {
id String @id @default(uuid())
shop String
type String // "BULK_DELETE" | "BULK_RESTORE"
status String @default("PENDING")
fileIds String // JSON array
totalFiles Int @default(0)
processedFiles Int @default(0)
successCount Int @default(0)
failCount Int @default(0)
progress Float @default(0)
currentFileId String?
currentFileName String?
results String @default("[]") // JSON
errorMessage String?
userEmail String?
userName String?
createdAt DateTime @default(now())
startedAt DateTime?
completedAt DateTime?
updatedAt DateTime @updatedAt
durationMs Int?
items BulkJobItem[]
}
BulkJobItem Table
model BulkJobItem {
id String @id @default(uuid())
bulkJobId String
fileId String
filename String
status String @default("PENDING")
retryCount Int @default(0)
maxRetries Int @default(3)
errorMessage String?
errorStack String?
createdAt DateTime @default(now())
startedAt DateTime?
processedAt DateTime?
durationMs Int?
bulkJob BulkJob @relation(fields: [bulkJobId], references: [id], onDelete: Cascade)
@@index([bulkJobId, status])
@@index([status])
}
Job Statuses
BulkJob:
- PENDING - Waiting for worker
- PROCESSING - Worker is processing
- COMPLETED - All files processed
- FAILED - Critical error occurred
- CANCELLED - Cancelled by user/system
BulkJobItem:
- PENDING - Waiting to process
- PROCESSING - Currently being processed
- SUCCESS - Completed successfully
- FAILED - Failed after 3 retries
- SKIPPED - Skipped (e.g., file already deleted)
Implementation: Job Creation
// Location: app/routes/app.files.tsx
if (intent === "bulkDelete") {
const fileIds = JSON.parse(formData.get("fileIds") as string);
// Validate files exist
const files = await prisma.mediaFile.findMany({
where: { id: { in: fileIds }, shop },
select: { id: true, filename: true, size: true },
});
// Check 100GB total limit
const totalSize = files.reduce((sum, f) => sum + Number(f.size), 0);
const MAX_TOTAL_SIZE = 100 * 1024 * 1024 * 1024;
if (totalSize > MAX_TOTAL_SIZE) {
return json({
success: false,
message: `Total size exceeds 100GB limit`
});
}
// Create bulk job
const job = await createBulkJob({
shop,
type: BulkJobType.BULK_DELETE,
fileIds: files.map(f => f.id),
fileNames: files.map(f => f.filename),
userEmail: userInfo.email,
userName: userInfo.name,
});
// Return immediately
return json({
success: true,
message: `Job created with ${files.length} files`,
jobId: job.id,
});
}
Bulk Delete Worker
// Location: app/lib/jobs/bulk-delete-worker.server.ts
export async function processBulkDeleteJob(params: ProcessBulkDeleteJobParams) {
const { jobId, shop, admin } = params;
try {
// Mark job as processing
const job = await markJobProcessing(jobId);
const userEmail = job.userEmail;
const userName = job.userName;
let successCount = 0;
let failCount = 0;
let processedCount = 0;
const results: JobResult[] = [];
// Process files sequentially (one at a time)
while (true) {
// Get next pending item
const pendingItems = await getPendingJobItems(jobId, 1);
if (pendingItems.length === 0) break; // All done
const item = pendingItems[0];
const itemStartTime = Date.now();
try {
await markJobItemProcessing(item.id);
// Check if file still exists
const fileExists = await prisma.mediaFile.findUnique({
where: { id: item.fileId },
});
if (!fileExists) {
// Already deleted - skip
await markJobItemSkipped(item.id, "File already deleted");
successCount++;
processedCount++;
} else {
// Delete the file
const result = await deleteFileToTrash(
shop,
item.fileId,
admin,
userEmail,
userName
);
const duration = Date.now() - itemStartTime;
if (result.success) {
await markJobItemComplete(item.id, true, undefined, duration);
successCount++;
processedCount++;
results.push({
fileId: item.fileId,
filename: item.filename,
success: true,
message: result.message,
retryCount: item.retryCount,
durationMs: duration,
});
} else {
throw new Error(result.message);
}
}
} catch (error: any) {
const duration = Date.now() - itemStartTime;
// Retry logic
if (item.retryCount < item.maxRetries) {
await retryJobItem(item.id); // Increments retry count, resets to PENDING
} else {
// Final failure after 3 retries
await markJobItemComplete(item.id, false, error.message, duration);
failCount++;
processedCount++;
results.push({
fileId: item.fileId,
filename: item.filename,
success: false,
message: error.message,
retryCount: item.retryCount,
durationMs: duration,
});
}
}
// Update progress after each file
await updateJobProgress(jobId, {
processedFiles: processedCount,
successCount,
failCount,
results,
});
}
// Mark job as completed
await markJobCompleted(jobId, { successCount, failCount, results });
return { success: true, successCount, failCount };
} catch (error: any) {
await markJobFailed(jobId, error.message);
return { success: false, error: error.message };
}
}
ey Features
1. Sequential Processing
- One file at a time (prevents memory overflow)
- Progress saved after each file
- Crash recovery at file-level granularity
2. Automatic Retries
- Each file gets 3 retry attempts
- Failed retries go back to PENDING
- Incremental backoff handled by queue
3. Crash Recovery
- resetStaleJobs() runs on worker startup
- Resets jobs stuck in PROCESSING > 10 minutes
- Resets individual items stuck in PROCESSING
4. Real-time Progress
- Job progress updated after each file
- UI polls job status for live updates
- Shows current file being processed
5. No Timeout Limits
- Worker runs indefinitely
- Can process thousands of files
- Can take hours or days if needed
Multi-Shop Queue Behavior
Global FIFO Queue:
- All shops share one queue
- Jobs processed oldest-first (createdAt ASC)
- Worker switches between shops as needed
Example with 200 shops:
Queue at 9:00 AM:
─────────────────
#1: Shop A - 50 files → Processes 9:00-9:04 AM
#2: Shop B - 100 files → Processes 9:04-9:12 AM
#3: Shop C - 30 files → Processes 9:12-9:15 AM
...
#200: Shop ZZ - 50 files → Processes 5:00-5:04 PM (8 hour wait!)
Wait time calculation:
Average job: 50 files × 5 sec/file = 250 seconds
Shop at position N: Wait time = N × 250 seconds
Position 1: 0 minutes
Position 50: 200 minutes (3.3 hours)
Position 100: 400 minutes (6.6 hours)
Position 200: 800 minutes (13.3 hours)
Architecture Comparison
Feature | Single Delete | Bulk Delete |
|---|---|---|
Execution | HTTP request | Background worker |
Timeout | 20 minutes | None |
Max Files | 1-5 recommended | Unlimited |
Max File Size | 500MB safe | Any size |
Crash Recovery | None (HTTP dies) | Automatic |
Retry Logic | None | 3 attempts per file |
Progress Tracking | No | Yes (real-time) |
User Experience | Immediate feedback | Async (polling required) |
Scalability | Limited | High |
Complexity | Low | High |
Error Handling & Recovery
Single Delete Error Handling
Network Errors:
- R2 upload failure → Aborts, no changes made
- Shopify API failure → File backed up but not deleted (manual cleanup needed)
Database Errors:
- Transaction timeout (30s) → Automatic rollback
- Connection lost → Partial state possible (R2 uploaded, DB unchanged)
Recovery: Retry button in UI
Bulk Delete Error Handling
Item-Level Errors:
- File not found → Skip (mark as SUCCESS)
- Network error → Retry (up to 3 times)
- Database error → Retry (up to 3 times)
- Persistent failure → Mark as FAILED, continue to next file
Job-Level Errors:
- No session found → Fail entire job
- Unexpected exception → Fail entire job
- 5 consecutive worker errors → Worker exits (requires restart)
Recovery:
- Automatic: Retry failed items via "Retry Failed" button
- Manual: Cancel job, create new job with remaining files
Stale Job Recovery
Problem: Worker crashes mid-job, job stuck in PROCESSING state
Solution:
export async function resetStaleJobs(thresholdMinutes = 10) {
const staleJobs = await prisma.bulkJob.findMany({
where: {
status: BulkJobStatus.PROCESSING,
},
});
for (const job of staleJobs) {
// Reset job to PENDING
await prisma.bulkJob.update({
where: { id: job.id },
data: {
status: BulkJobStatus.PENDING,
startedAt: null,
},
});
// Reset any processing items back to PENDING
await prisma.bulkJobItem.updateMany({
where: {
bulkJobId: job.id,
status: BulkJobItemStatus.PROCESSING,
},
data: {
status: BulkJobItemStatus.PENDING,
startedAt: null,
},
});
}
}
When it runs: Once at worker startup
Performance Characteristics
Single Delete Performance
File Size | Network | Time | Bottleneck |
|---|---|---|---|
20MB | Good | 5-10 sec | R2 upload |
100MB | Good | 15-30 sec | R2 upload |
500MB | Good | 60-120 sec | R2 upload |
2GB | Good | 240-600 sec | R2 upload + Shopify fetch |
Components:
- Shopify CDN fetch: 2-60 sec (depends on file size)
- R2 upload: 2-300 sec (depends on file size)
- DB transaction: 100-500 ms
- Shopify delete API: 500-2000 ms
- Audit logging: 50-100 ms
Bulk Delete Performance
Per-file overhead: ~100-200ms (DB queries, status updates)
Throughput:
- Small files (< 10MB): ~10-12 files/minute
- Medium files (50MB): ~5-6 files/minute
- Large files (200MB): ~2-3 files/minute
50 files @ 20MB each:
- Expected time: 4-8 minutes
- With retries: 5-12 minutes
- Worst case (network issues): 15-30 minutes
Database Impact:
- Queries per file: ~15-20
- Updates per file: ~5-8
- Total for 50 files: ~1,000 queries
- SQLite handles this easily with good indexes
Known Issues & Limitations
Issue 1: Single Global Queue
Problem: All shops share one FIFO queue. No priority, no per-shop limits.
Impact:
- One shop with large job blocks all others
- Position 200 in queue waits 8-16 hours
- No way to prioritize paying customers
Example:
Shop A: 10,000 files (takes 20 hours)
Shop B: 5 files (waits 20 hours!)
Issue 2: No Horizontal Scaling
Problem: Single worker, single machine. Can't scale to multiple workers.
Impact:
- Limited to ~500-1000 files/hour throughput
- Worker crash affects ALL shops
- No redundancy
Workaround: None (architectural limitation)
Fix Required: Distributed job queue (Redis, BullMQ, etc.)
Issue 3: Orphaned R2 Files
Problem: If operation fails after R2 upload but before DB transaction, file remains in R2 forever.
Impact: Storage cost accumulation
Frequency: Rare (< 0.1% of operations)
Workaround: Manual R2 cleanup script
Fix Required: R2 garbage collection job
Issue 4: 100GB Bulk Limit
Problem: Hard limit of 100GB total per bulk operation (safety measure)
Impact: Large shops with many big files can't delete in one batch
Workaround: Split into multiple jobs
Fix Required: Increase limit or remove (with better testing)