SaveState Architecture Deep Dive: How We Built Time Machine for AI

Introduction

When we set out to build SaveState, we had one guiding principle: your data should be yours, forever.

That sounds simple, but it has profound implications for architecture. It means:

Encryption with keys we never see
An open archive format with no vendor lock-in
Storage backends you control
Platform adapters that work without our permission

This post dives deep into how SaveState works under the hood. Whether you're evaluating it for security, planning to contribute an adapter, or just curious about the engineering, this is your guide.

System architecture diagram showing CLI to Adapters to Encryption to Storage flow

Architecture Overview

SaveState has four major layers:

┌─────────────────────────────────────────────────┐
│                  SaveState CLI                  │
│    init · snapshot · restore · list · diff      │
├─────────────────────────────────────────────────┤
│              Adapter Layer                      │
│   chatgpt · claude · gemini · openai · custom   │
├─────────────────────────────────────────────────┤
│              Encryption Layer                   │
│   AES-256-GCM · scrypt KDF · integrity check    │
├─────────────────────────────────────────────────┤
│              Storage Backends                   │
│   local · s3 · r2 · cloud API                   │
└─────────────────────────────────────────────────┘

Each layer has a single responsibility:

CLI Layer: User interface, command parsing, orchestration
Adapter Layer: Platform-specific data extraction and restoration
Encryption Layer: Cryptographic operations, key derivation
Storage Layer: Persisting encrypted snapshots

Let's examine each in detail.

The SaveState Archive Format (SAF)

At the heart of SaveState is the SAF — an open specification for AI state archives.

Design Goals

When designing SAF, we optimized for:

Portability — Must work across any AI platform
Simplicity — Standard tools should be able to unpack it (after decryption)
Extensibility — New platforms shouldn't require format changes
Efficiency — Incremental snapshots should be cheap

File Structure

Each snapshot produces a .saf.enc file:

snapshot-2026-01-27T15-00-00Z.saf.enc

The .enc suffix indicates encryption. After decryption, you get a gzipped tarball:

snapshot.saf.enc (encrypted envelope)
  └── snapshot.tar.gz (gzipped tarball)
       ├── manifest.json
       ├── identity/
       │   ├── personality.md
       │   ├── config.json
       │   └── tools.json
       ├── memory/
       │   ├── core.json
       │   └── knowledge/
       ├── conversations/
       │   ├── index.json
       │   └── threads/
       └── meta/
           ├── platform.json
           ├── snapshot-chain.json
           └── restore-hints.json

manifest.json

Every SAF archive starts with a manifest:

{
  "version": "1.0",
  "id": "ss-2026-01-27T15-00-00Z-a1b2c3",
  "timestamp": "2026-01-27T15:00:00.000Z",
  "platform": "chatgpt",
  "adapter": "chatgpt@0.2.1",
  "checksum": "sha256:abc123...",
  "incremental": false,
  "parent": null
}

The manifest provides metadata for the snapshot and enables integrity verification. The checksum covers the entire unpacked archive, allowing tamper detection even if the encryption layer is somehow bypassed.

identity/

The identity directory captures who your AI is:

personality.md — System prompts, custom instructions, SOUL files

# Custom Instructions

## What would you like ChatGPT to know about you?
I'm a software developer working primarily in TypeScript...

## How would you like ChatGPT to respond?
Be concise and technical. Skip the preamble...

config.json — Settings and preferences

{
  "model": "gpt-4o",
  "temperature": 0.7,
  "plugins": ["code-interpreter", "browsing"],
  "memory_enabled": true
}

memory/

The memory directory captures what your AI knows:

core.json — Platform memory entries

{
  "entries": [
    {
      "id": "mem_001",
      "content": "User prefers TypeScript over JavaScript",
      "created": "2026-01-15T10:30:00Z",
      "source": "conversation:abc123"
    },
    {
      "id": "mem_002", 
      "content": "User is building a backup tool for AI agents",
      "created": "2026-01-20T14:15:00Z",
      "source": "explicit"
    }
  ]
}

knowledge/ — Uploaded documents, RAG sources. Larger files (PDFs, datasets, knowledge bases) are stored in the knowledge/ subdirectory with their metadata preserved.

Expanded view of SAF file structure with annotations explaining each section

Encryption: Zero-Knowledge by Design

SaveState's encryption ensures that your data is protected even if our servers (or any storage backend) are compromised.

Key Derivation

We use scrypt for key derivation — a memory-hard function that resists GPU-based brute forcing:

User passphrase
    → scrypt (N=2^17, r=8, p=1, dkLen=32)
    → 256-bit AES key

Parameters explained:

N=131072 (2^17) — CPU/memory cost parameter (high = harder to brute force)
r=8 — Block size parameter
p=1 — Parallelization parameter
dkLen=32 — Output key length (256 bits)

These parameters are intentionally aggressive. Key derivation takes ~100ms on modern hardware, which is imperceptible to legitimate users but devastating to attackers trying billions of guesses.

Encryption

Once we have the key, we use AES-256-GCM:

Plaintext (tarball)
    → AES-256-GCM encrypt with:
        - 256-bit key (from scrypt)
        - 96-bit random IV (nonce)
    → Ciphertext + 128-bit auth tag

GCM (Galois/Counter Mode) provides both confidentiality and integrity:

Confidentiality: Data is unreadable without the key
Integrity: Any tampering is detected (auth tag verification fails)

File Format

The encrypted file structure:

┌──────────────────────────────────────────────────┐
│  Salt (32 bytes)                                 │
├──────────────────────────────────────────────────┤
│  IV/Nonce (12 bytes)                            │
├──────────────────────────────────────────────────┤
│  Ciphertext (variable length)                   │
├──────────────────────────────────────────────────┤
│  Auth Tag (16 bytes)                            │
└──────────────────────────────────────────────────┘

The salt is regenerated for each encryption, ensuring the same passphrase produces different keys for different files.

Implementation

Here's a simplified version of our encryption code:

import { scryptSync, randomBytes, createCipheriv, createDecipheriv } from 'crypto';

const SCRYPT_N = 2 ** 17;
const SCRYPT_R = 8;
const SCRYPT_P = 1;
const KEY_LENGTH = 32;
const IV_LENGTH = 12;
const SALT_LENGTH = 32;

export function encrypt(plaintext: Buffer, passphrase: string): Buffer {
  const salt = randomBytes(SALT_LENGTH);
  const key = scryptSync(passphrase, salt, KEY_LENGTH, {
    N: SCRYPT_N,
    r: SCRYPT_R,
    p: SCRYPT_P,
  });
  
  const iv = randomBytes(IV_LENGTH);
  const cipher = createCipheriv('aes-256-gcm', key, iv);
  
  const ciphertext = Buffer.concat([
    cipher.update(plaintext),
    cipher.final(),
  ]);
  
  const authTag = cipher.getAuthTag();
  
  return Buffer.concat([salt, iv, ciphertext, authTag]);
}

What This Means for You

We never see your passphrase — It's never transmitted anywhere
We can't read your backups — Even if you use our cloud storage
Lost passphrase = lost data — There's no recovery mechanism (this is a feature, not a bug)
Each file is independently encrypted — Compromising one doesn't help with others

Encryption flow diagram showing passphrase to scrypt to AES-256-GCM to encrypted file

The Adapter System

Adapters are SaveState's bridge to the AI platform ecosystem. Each adapter knows how to extract and restore state for a specific platform.

The Interface

Every adapter implements this TypeScript interface:

interface Adapter {
  // Identification
  readonly id: string;        // e.g., 'chatgpt'
  readonly name: string;      // e.g., 'ChatGPT'
  readonly platform: string;  // Platform family
  readonly version: string;   // Adapter version

  // Core operations
  detect(): Promise<boolean>;
  extract(): Promise<Snapshot>;
  restore(snapshot: Snapshot, options?: RestoreOptions): Promise<void>;
  identify(): Promise<PlatformMeta>;

  // Capabilities (optional)
  capabilities?: {
    extract: boolean;
    restore: boolean;
    partialRestore: boolean;
    incrementalExtract: boolean;
  };
}

Currently Supported

Platform	Strategy
Clawdbot	Read SOUL.md, memory/ directory, conversations directly from filesystem
OpenAI Assistants	API calls to list assistants, threads, files, vector stores
ChatGPT	Parse data export JSON (conversations.json, memories, user_system_instructions)
Claude Web	Parse exported data + scrape Projects via authenticated session
Gemini	Google Takeout data + Gems API where available

Building Your Own Adapter

Creating an adapter for a new platform is straightforward. Publish as @savestate/adapter-* on npm and SaveState will auto-discover it.

Adapter registry diagram showing multiple adapters plugging into the same core

Storage Backends

SaveState separates "what to store" from "where to store it" through pluggable storage backends.

Available Backends

Local Filesystem (default) — Snapshots in .savestate/snapshots/
S3-Compatible (R2, B2, Minio, etc.) — AWS Signature V4 with zero external dependencies
SaveState Cloud API (Pro/Team) — Managed R2 bucket through authenticated API

Incremental Snapshots

Like git, SaveState only stores what changed between snapshots.

How It Works

Hash everything — Each file in the snapshot gets a SHA-256 hash
Compare to parent — Check which hashes differ from the previous snapshot
Store deltas only — Unchanged files reference the parent snapshot

Space Savings

Real-world savings are significant:

Snapshot	Type	Size	Full Equivalent	Savings
Day 1	Full	2.1 MB	2.1 MB	0%
Day 2	Incremental	47 KB	2.2 MB	98%
Day 3	Incremental	12 KB	2.2 MB	99%
Day 7	Incremental	89 KB	2.4 MB	96%

Most changes are new conversations or memory updates. Identity, tools, and knowledge rarely change.

Visual showing snapshot chain with delta sizes shrinking over time

What's Next

SaveState's architecture is designed for extensibility. Here's what we're working on:

Encrypted search index — Search across snapshots without decrypting everything
Hardware key support — YubiKey, Touch ID via Secure Enclave
Shamir's Secret Sharing — Split your passphrase across trusted parties
Streaming encryption — Handle arbitrarily large snapshots
Adapter SDK — Simplified adapter development with testing utilities

Conclusion

SaveState's architecture reflects our core belief: your data should be yours.

Open format — SAF is documented, standard tools work with it
Strong encryption — AES-256-GCM with scrypt, keys you control
Pluggable adapters — Any platform can be supported
Flexible storage — Store anywhere, we never see your data

We've tried to make the right choices for security and portability, even when they made implementation harder. Because when it comes to your AI identity, there's no room for shortcuts.

Ready to protect your AI identity?

Get started with SaveState today.

Install SaveState

npm install -g @savestate/cli
savestate init
savestate snapshot

Questions? Comments? Find us on GitHub or Twitter.