pg-agent-memory
Stateful AI agent memory layer for PostgreSQL with pgvector. TypeScript-first with intelligent context management and zero-cost embeddings.
<5ms
Memory operations
Local
Embeddings
5+
Token counting models
28M/sec
ULID generation
The Challenge
AI agents suffer from memory amnesia - they forget everything between conversations. Existing solutions are either:
- Basic conversation storage without semantic understanding
- Expensive with high API costs for embeddings and token counting
- Require separate vector databases increasing infrastructure complexity
- Lack intelligent compression for large conversation histories
- Missing multi-model support for different AI providers
The Solution
Built the first TypeScript-native AI memory layer that combines PostgreSQL reliability with intelligent context management:
Local Embeddings
Zero-cost semantic search using local Sentence Transformers with @xenova/transformers
PostgreSQL Native
Uses existing PostgreSQL infrastructure with pgvector for vector similarity search
Multi-Model Support
Universal tokenizer supporting OpenAI, Anthropic, DeepSeek, Google, Meta with accurate counting
Intelligent Compression
Automatic memory compression with 4 strategies to manage large conversation histories
Technical Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ AgentMemory │────│ EmbeddingService│─────│ @xenova/trans.. │ │ remember() │ │ generate() │ │ all-MiniLM-L6-v2│ │ recall() │ │ │ │ (384 dims) │ └─────────────────┘ └──────────────────┘ └─────────────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ PostgreSQL │────│ pgvector │────│ TokenCounter │ │ memories table │ │ cosine similarity│ │ OpenAI/Claude │ │ ULID + content │ │ <-> operator │ │ DeepSeek/etc │ └─────────────────┘ └──────────────────┘ └─────────────────┘
Tech Stack
Results & Impact
Technical Achievements
- ✓ Sub-5ms memory operations with PostgreSQL indexing
- ✓ Zero-cost embeddings eliminating API dependencies
- ✓ Multi-model token counting with provider-specific optimizations
- ✓ Intelligent compression with 4 compression strategies
Implementation Quality
- ✓ Published on NPM with semantic versioning
- ✓ Docker containerized test environment
- ✓ Unit and integration test coverage
- ✓ TypeScript strict mode compilation
Key Technical Decisions
PostgreSQL-First Approach
Leveraged existing PostgreSQL infrastructure with pgvector instead of requiring separate vector databases
Local Embeddings for Cost Efficiency
Used @xenova/transformers for local embedding generation, eliminating API costs and latency
Universal Token Counting
Built provider-specific token counting based on official documentation for accurate cost estimation
ULID-Based Performance Optimization
Used ULID for time-sortable IDs achieving 28M operations/second for optimal database performance
My Role
As sole architect and maintainer, I:
- Designed the PostgreSQL schema with pgvector integration for optimal performance
- Implemented local embedding pipeline using Sentence Transformers
- Built universal tokenizer supporting 5+ AI providers with accurate token counting
- Created intelligent compression system with multiple strategies
- Established comprehensive testing with Docker integration environments
- Optimized performance achieving sub-5ms operations and 28M ops/sec ID generation