01Executive Summary
Distributed Custody Failover is JIL Sovereign's system for eliminating single-provider custody risk by distributing assets across multiple independent custody providers in different jurisdictions and automatically migrating assets when a provider fails. The system performs hourly health checks across four dimensions - API responsiveness, signing capability, balance consistency, and certificate validity - and triggers automatic migration upon detecting three consecutive failures or a single critical failure.
Migration is executed through a 2-of-3 multi-signature approval involving the platform, the user, and the backup custody provider. The entire failover process - from failure detection to asset migration completion - is bounded to 15 minutes. Throughout the migration, protection coverage is continuously maintained, ensuring users are never exposed to unprotected custody gaps.
02Problem Statement
Institutional crypto custody is dominated by single-provider models where all assets are held by one custodian. The failures of Celsius, BlockFi, and Genesis demonstrated that even well-capitalized custodians can fail, trapping billions in customer assets in bankruptcy proceedings for years.
Single Provider Dependency
Most institutions custody their crypto assets with a single qualified custodian (Coinbase Custody, BitGo, Anchorage). While these providers have strong security track records, concentrating all assets with one provider creates a binary risk: if that provider fails - whether through insolvency, regulatory action, or technical failure - all assets become inaccessible simultaneously.
Manual Migration Complexity
Moving assets between custody providers is a manual, time-consuming process that typically takes days or weeks. It requires coordinating key ceremonies, verifying receiving addresses, and managing the compliance documentation for each asset. During this migration period, assets may be in transit without adequate protection coverage.
No Health Monitoring Standard
There is no industry standard for monitoring custody provider health in real time. Institutions typically learn about custody provider problems through news reports or when their own operations are affected, leaving no time for preventive action.
03Technical Architecture
The failover system operates through four components: the health monitor, the migration orchestrator, the multi-sig approval engine, and the protection continuity layer.
Health Check Dimensions
| Dimension | Check Method | Frequency | Failure Threshold | Critical Indicator |
|---|---|---|---|---|
| API Responsiveness | HTTP health endpoint + latency measurement | Every 60 minutes | Response time over 5s or HTTP 5xx | Complete unreachability |
| Signing Capability | Test signature request with canary transaction | Every 60 minutes | Signature failure or timeout over 30s | Key material unavailable |
| Balance Consistency | On-chain balance vs custodian reported balance | Every 60 minutes | Discrepancy exceeding 0.01% | Balance decrease without withdrawal |
| Certificate Validity | TLS certificate chain verification + expiry check | Every 60 minutes | Certificate expiring within 7 days | Expired or revoked certificate |
Failure Detection Logic
For each custody provider:
Run all 4 health checks every 60 minutes
Failure escalation:
1 failure -> Warning logged, alert to ops team
2 failures -> Elevated status, pre-stage migration
3 failures -> CRITICAL: Trigger automatic migration
Critical override (immediate migration):
- Balance decrease without authorized withdrawal
- Key material reported unavailable
- Complete API unreachability for 2+ consecutive checks
- Certificate revoked (not just expired)
Migration target selection:
- Select highest-health backup provider
- Verify backup provider passes all 4 health checks
- Confirm backup provider has capacity for incoming assets
- Verify jurisdictional compatibility
Migration Flow
Failure detected -> Migration orchestrator activates
|
+--> Pre-stage: Identify target provider, verify health
| (Time: 0 to 2 minutes)
|
+--> Multi-sig approval: 2-of-3 (platform + user + backup)
| Platform auto-signs based on health check evidence
| User notified via multi-channel alerts
| Backup provider auto-signs upon receiving request
| (Time: 2 to 5 minutes)
|
+--> Asset transfer: Batch withdrawal from failing provider
| Deposit to backup provider in parallel batches
| (Time: 5 to 12 minutes)
|
+--> Verification: Confirm on-chain balances match
| Update internal routing to new provider
| (Time: 12 to 15 minutes)
|
+--> Complete: Protection coverage confirmed continuous
Dashboard updated with new provider allocation
04Implementation
The failover system is implemented as an extension to the wallet-api and mpc-cosigner services, with the health monitor running as a dedicated background process.
Provider Abstraction Layer
Each custody provider is integrated through a standardized Provider Adapter interface that exposes consistent methods for health checks, balance queries, withdrawal requests, and deposit address generation. The adapter translates between JIL's internal API format and each provider's specific API. Currently supported providers include Fireblocks, BitGo, Anchorage, and JIL's native MPC custody.
Multi-Sig Migration Approval
Migration transactions require 2-of-3 signatures from three parties: the JIL platform (auto-signs based on health check evidence), the user (pre-authorized auto-sign for health-triggered migrations, or manual approval via wallet), and the backup custody provider (auto-signs upon receiving a valid migration request). The multi-sig contract enforces that at least two of these three parties must approve before any asset movement occurs.
Protection Continuity
Protection coverage is maintained throughout the migration by a coverage bridge mechanism. When migration is triggered, the protection underwriter is notified and the coverage period is extended to overlap both the departing and receiving providers. Coverage is formally transferred to the new provider upon verification of successful deposit, with no gap in protection at any point during the process.
05Integration with JIL Ecosystem
Distributed custody failover enhances JIL Sovereign's non-custodial protection guarantee with operational resilience.
MPC Cosigner
The mpc-cosigner service coordinates key shard migration during failover. New MPC key sets are generated for the backup provider while the user's shard remains unchanged, maintaining the 2-of-3 threshold model.
Protection Coverage
The protection subsystem receives real-time notifications of custody migrations and automatically extends coverage to bridge the transition period. Underwriters are pre-authorized for failover scenarios in the coverage terms.
AI Fleet Inspector
The Fleet Inspector monitors custody provider health metrics alongside validator health. Provider health degradation triggers elevated fleet threat scores and pre-emptive migration staging before full failure occurs.
Ops Dashboard
The ops-dashboard displays a unified view of all custody providers with health status, asset allocation, migration history, and protection coverage status. Alert tiles show active migration events in real time.
06Prior Art Differentiation
JIL's distributed custody failover introduces automated resilience not available in any existing custody solution.
| Feature | Fireblocks | BitGo | Copper ClearLoop | JIL Distributed Custody |
|---|---|---|---|---|
| Multi-Provider | Single provider | Single provider | Exchange settlement | Multiple independent providers |
| Health Monitoring | Internal monitoring | Status page | Not applicable | 4-dimension hourly probes |
| Automatic Failover | No | No | No | Yes (3 failures or 1 critical) |
| Migration Time | Days to weeks (manual) | Days to weeks (manual) | Not applicable | 15 minutes (bounded) |
| Protection During Migration | Gap in coverage | Gap in coverage | Not applicable | Continuous (coverage bridge) |
| Multi-Sig Approval | Internal approval | Internal approval | Not applicable | 2-of-3 (platform + user + backup) |
07Implementation Roadmap
Health Monitoring Framework
Deploy provider abstraction layer with Fireblocks and BitGo adapters. Implement 4-dimension health check system with hourly probing. Build alerting pipeline for health degradation detection.
Migration Orchestrator
Implement automatic migration triggering on 3 consecutive failures. Build 2-of-3 multi-sig approval flow. Deploy batch asset transfer with parallel execution. Add Anchorage adapter for third provider support.
Protection Continuity
Integrate coverage bridge mechanism with protection underwriters. Implement automatic coverage extension during migration. Deploy real-time coverage verification at each migration stage.
Unified Dashboard and Analytics
Build cross-provider visibility dashboard in ops-dashboard. Add migration history and analytics. Implement predictive failure detection using health check trend analysis. Deploy migration drill testing framework.