Material Kai Vision Platform – Complete Overview
The Future of PDFIntelligence: AI-Powered Document Processing, Search, and Knowledge Management
Executive Summary
The PDF2Markdown Intelligence Platform of NoCodeAPI is a production-grade AI-powered system serving 5,000+ users with 99.5%+ uptime. It transforms material catalog PDFs into searchable, structured knowledge using 12 AI models across a 14-stage processing pipeline. The platform combines document processing, material recognition, multi-vector search, and knowledge management into a unified system.
Key Capabilities
- Intelligent PDF Processing: 14-stage AI pipeline with checkpoint recovery
- Product Discovery: Two-stage AI classification (95%+ accuracy)
- Multi-Vector Search: 6 embedding types for comprehensive material discovery
- Material Recognition: Llama 4 Scout 17B Vision (69.4% MMMU, #1 OCR)
- Knowledge Management: Real-time updates with workspace isolation
- 3D Generation: AI-powered material visualization
- Admin Dashboard: Comprehensive monitoring and analytics
Platform Architecture
Technology Stack
Frontend:
- React 18 + TypeScript + Vite
- Shadcn/ui + TailwindCSS
- Deployed on Vercel Edge Network
- Real-time updates via Supabase subscriptions
Backend:
- MIVAA API: FastAPI + Python 3.11
- 74+ REST API endpoints
- Docker containerized
- Self-hosted on dedicated server
Database:
- Supabase PostgreSQL 15
- pgvector extension for similarity search
- Row-Level Security (RLS) for multi-tenancy
- 30+ Edge Functions (TypeScript/Deno)
AI Services:
- OpenAI (GPT-4o, text-embedding-3-small)
- Anthropic (Claude Sonnet 4.5, Claude Haiku 4.5)
- Together AI (Llama 4 Scout 17B Vision)
- CLIP (Visual embeddings)
- Replicate (Image generation)
System Flow
User uploads PDF → Frontend (React)
↓
Supabase Edge Function (mivaa-gateway)
↓
MIVAA API (FastAPI) → Creates background job
↓
14-Stage Processing Pipeline:
1. PDF Analysis (PyMuPDF4LLM)
2. Product Discovery (Claude Haiku → Sonnet)
3. Text Extraction (focused on product pages)
4. Semantic Chunking (Anthropic)
5. Text Embeddings (OpenAI 1536D)
6. Image Extraction
7. Image Analysis (Llama Vision)
8. CLIP Embeddings (512D)
9. Product Creation (Two-stage AI)
10. Metafield Extraction
11. Deferred AI Analysis (async)
12. Specialized Embeddings (color, texture, application)
13. Quality Validation (Claude)
14. Cleanup & Completion
↓
Data stored in Supabase → Available for search
↓
Real-time updates → Frontend displays results
AI Models & Intelligence
12 AI Models Across 7 Pipeline Stages
1. Anthropic Claude Models
Claude Sonnet 4.5 (Premium Tier):
- Use Cases: Deep product analysis, complex metadata extraction, quality validation
- Context: 200,000 tokens
- Performance: Highest accuracy for complex reasoning
- Pipeline Stages: Product Discovery (Stage 2), Deferred AI Analysis (Stage 11)
Claude Haiku 4.5 (Mid Tier):
- Use Cases: Fast content classification, product boundary detection
- Context: 200,000 tokens
- Performance: 3x faster than Sonnet, 90% accuracy
- Pipeline Stages: Product Discovery (Stage 1), Content Classification
2. OpenAI Models
GPT-4o:
- Use Cases: Product discovery, conversational AI, complex reasoning
- Context: 128,000 tokens
- Performance: High accuracy, multimodal capabilities
- Pipeline Stages: Product Discovery (alternative to Claude)
text-embedding-3-small:
- Use Cases: Text chunk embeddings, semantic search
- Dimensions: 1536
- Performance: 62.3% MTEB score
- Cost: $0.02 per 1M tokens
- Pipeline Stages: Text Embedding Generation (Stage 5)
3. Together AI – Llama 4 Scout 17B Vision
- Parameters: 17 billion
- Modality: Vision + Text
- Use Cases: Material image analysis, product classification, OCR
- Performance:
- 69.4% MMMU (Massive Multitask Multimodal Understanding)
- #1 ranked for OCR tasks
- 85%+ accuracy on material recognition
- Cost: $0.30 per 1M tokens
- Pipeline Stages: Image Analysis (Stage 7), Material Recognition
4. CLIP (OpenAI)
- Model: Vision Transformer Base 32
- Dimensions: 512
- Use Cases: Visual embeddings, image-text similarity, visual search
- Performance: Industry standard for visual embeddings
- Cost: Free (self-hosted)
- Pipeline Stages: Image Embedding Generation (Stage 8)
5. Replicate Models
Stable Diffusion XL: 3D texture generation, material visualization
FLUX-Schnell: Fast image generation, material previews
Multi-Vector Embeddings (6 Types)
The platform generates 6 types of embeddings for comprehensive search:
- Text Embeddings (1536D) – OpenAI text-embedding-3-small
- Visual CLIP Embeddings (512D) – CLIP ViT-B/32
- Color Embeddings (256D) – Custom color analysis
- Texture Embeddings (256D) – Custom texture analysis
- Application Embeddings (512D) – Use-case classifier
- Multimodal Embeddings (2048D) – Combined text + visual
PDF Processing Pipeline (14 Stages)
Stage-by-Stage Breakdown
Stage 1: PDF Upload & Validation
- File validation (size, type, corruption)
- Upload to Supabase Storage
- Create document record
Stage 2: Background Job Creation
- Create background_jobs record
- Initialize progress tracking
- Return job_id to frontend
Stage 3: PDF Analysis
- Extract PDF metadata (pages, size, structure)
- Analyze document type
- Select processing strategy
- Checkpoint: PDF_EXTRACTED
Stage 4: Product Discovery (AI)
- Claude Haiku 4.5: Fast product identification (5-15 seconds)
- Identify product count and page ranges
- Claude Sonnet 4.5: Validate and enrich metadata (10-30 seconds)
- Extract product names, dimensions, variants, designers
- Output: Product list with page ranges (95%+ accuracy)
Stage 5: Text Extraction (Focused)
- PyMuPDF4LLM: Extract text from product pages only
- Preserve structure and formatting
- Extract metadata (fonts, colors, layout)
Stage 6: Semantic Chunking (AI)
- Anthropic Chunking API: Split text semantically
- Max tokens: 800, Overlap: 100
- Preserve context and meaning
- Create document_chunks records
- Checkpoint: CHUNKS_CREATED
Stage 7: Text Embedding Generation (AI)
- OpenAI text-embedding-3-small: Generate 1536D embeddings
- Store in pgvector for similarity search
- Link embeddings to chunks
- Checkpoint: TEXT_EMBEDDINGS_GENERATED
Stage 8: Image Extraction & Upload
- Extract images from product pages
- Upload to Supabase Storage (pdf-tiles bucket)
- Create document_images records
- Extract image metadata (dimensions, format)
- Checkpoint: IMAGES_EXTRACTED
Stage 9: Image Analysis (AI)
- Llama 4 Scout 17B Vision: Analyze each image (1-3 seconds)
- Extract material properties
- Quality scoring (0-100)
- Classify image type (product, detail, mood, diagram)
Stage 10: CLIP Embedding Generation (AI)
- CLIP ViT-B/32: Generate 512D visual embeddings (50-150ms per image)
- Store in database for visual search
- Link to document_images
- Checkpoint: IMAGE_EMBEDDINGS_GENERATED
Stage 11: Product Creation (Two-Stage AI)
- Stage 1: Content classification (product/supporting/administrative)
- Stage 2: Product boundary detection
- Create products records with metadata
- Link chunks and images to products
- Checkpoint: PRODUCTS_CREATED
Stage 12: Metafield Extraction
- Extract dynamic metadata from chunks
- Create metafield_values records
- Link to chunks, products, images
Stage 13: Deferred AI Analysis (Async Background Job)
- Claude Sonnet 4.5: Validate low-scoring images
- Generate specialized embeddings:
- Color embeddings (256D)
- Texture embeddings (256D)
- Application embeddings (512D)
- Multimodal embeddings (2048D)
- Enhanced metadata extraction
Stage 14: Cleanup & Completion
- Delete temporary files from disk
- Kill background processes
- Update job status to ‘completed’
- Send completion notification
Processing Performance
| PDF Size | Pages | Products | Time | Accuracy |
|---|---|---|---|---|
| Small | 1-20 | 1-5 | 1-2 min | 95%+ |
| Medium | 21-50 | 6-15 | 2-4 min | 95%+ |
| Large | 51-100 | 16-30 | 4-8 min | 95%+ |
| Extra Large | 100+ | 30+ | 8-15 min | 95%+ |
Benchmark: Harmony PDF extracts 14+ distinct products with complete metadata (product names, dimensions, designers, page ranges, variants, image types).
Checkpoint Recovery System
The pipeline includes 9 checkpoints for recovery on failure:
- PDF_EXTRACTED
- CHUNKS_CREATED
- TEXT_EMBEDDINGS_GENERATED
- IMAGES_EXTRACTED
- IMAGE_EMBEDDINGS_GENERATED
- PRODUCTS_CREATED
- METAFIELDS_EXTRACTED
- DEFERRED_ANALYSIS_QUEUED
- COMPLETED
On job restart, the system resumes from the last completed checkpoint, avoiding redundant processing.
Search & Discovery
Multi-Vector Search System
The platform uses 6 embedding types for comprehensive search:
Semantic Search (Text):
- Query: “sustainable wood materials”
- Embedding: OpenAI text-embedding-3-small (1536D)
- Similarity: Cosine similarity via pgvector
- Accuracy: 85%+
Visual Search (Images):
- Query: Upload image or describe visually
- Embedding: CLIP ViT-B/32 (512D)
- Similarity: Visual similarity matching
- Accuracy: 88%+
Hybrid Search (Combined):
- Query: Text + Image + Filters
- Weights: Configurable (e.g., 60% semantic, 40% keyword)
- Ranking: Multi-factor scoring
- Accuracy: 90%+
Specialized Search:
- Color-based: Find materials by color palette
- Texture-based: Find similar textures
- Application-based: Find materials for specific use cases
Search Performance
- Response Time: 200-800ms
- Accuracy: 85%+
- Concurrent Users: 5,000+
- Throughput: 1000+ queries/minute
Database Architecture
Core Tables
workspaces: Multi-tenant workspace management
documents: PDF documents and metadata
document_chunks: Semantic text chunks with 1536D embeddings
document_images: Extracted images with 512D CLIP embeddings
products: Product records from PDFs
background_jobs: Async job tracking with checkpoint recovery
material_metadata_fields: Dynamic metafield definitions
metafield_values: Metafield data for chunks/products/images
Storage Buckets
pdf-documents: Original PDF files (50MB max)
pdf-tiles: Extracted images (10MB max)
material-images: Material photos (10MB max)
3d-models: Generated 3D models (100MB max)
Security
Row-Level Security (RLS): All tables protected
Workspace Isolation: Users only access their workspace data
JWT Authentication: Supabase Auth with automatic token refresh
Encryption: At rest and in transit
Frontend Features
User-Facing Features
Dashboard: Metrics, feature grid, quick actions
PDF Processing: Drag-and-drop upload with real-time progress
Materials Catalog: Searchable, filterable product catalog
Search Hub: AI-powered semantic search
Material Recognition: Upload images for material identification
3D Generation: AI-powered material visualization
Mood Boards: Create and share material collections
Shopping Cart: Quote requests and commission tracking
Admin Features
Knowledge Base Management: View/edit chunks, images, products
PDF Processing Monitor: Real-time job tracking
AI Metrics Dashboard: Model usage and cost tracking
Quality Dashboard: Chunk quality and embedding stability
System Performance: Response times, error rates, uptime
User Management: Workspace members and permissions
Async Job Queue Monitor: Background job status and recovery
API Ecosystem
74+ REST API Endpoints
Categories:
- RAG & Document Processing (15 endpoints)
- Search APIs (6 endpoints)
- Embedding APIs (5 endpoints)
- Products API (8 endpoints)
- Images API (6 endpoints)
- AI Services (7 endpoints)
- Background Jobs (5 endpoints)
- Admin & Monitoring (12 endpoints)
- Document Management (13 endpoints)
- Anthropic APIs (3 endpoints)
- Together AI APIs (3 endpoints)
- Health & Monitoring (4 endpoints)
Documentation:
- Swagger UI:
https://v1api.materialshub.gr/docs - ReDoc:
https://v1api.materialshub.gr/redoc - OpenAPI Schema:
https://v1api.materialshub.gr/openapi.json
Production Metrics
Performance
- Uptime: 99.5%+
- Users: 5,000+
- Search Response: 200-800ms
- PDF Processing: 1-15 minutes (size-dependent)
- Concurrent Jobs: Unlimited queue
Accuracy
- Product Detection: 95%+
- Search Accuracy: 85%+
- Material Recognition: 90%+
- Image Classification: 88%+
Scalability
- Database: Auto-scaling with connection pooling
- Frontend: Global Edge Network (Vercel)
- API: Docker containerized with horizontal scaling
- Storage: Unlimited (Supabase)
Last Updated: 2025-10-31
Version: 2.0.0
Status: Production
Users: 5,000+
Uptime: 99.5%+