Material Kai Vision Platform – Complete Overview

The Future of PDFIntelligence: AI-Powered Document Processing, Search, and Knowledge Management


Executive Summary

The PDF2Markdown Intelligence Platform of NoCodeAPI is a production-grade AI-powered system serving 5,000+ users with 99.5%+ uptime. It transforms material catalog PDFs into searchable, structured knowledge using 12 AI models across a 14-stage processing pipeline. The platform combines document processing, material recognition, multi-vector search, and knowledge management into a unified system.

Key Capabilities

  • Intelligent PDF Processing: 14-stage AI pipeline with checkpoint recovery
  • Product Discovery: Two-stage AI classification (95%+ accuracy)
  • Multi-Vector Search: 6 embedding types for comprehensive material discovery
  • Material Recognition: Llama 4 Scout 17B Vision (69.4% MMMU, #1 OCR)
  • Knowledge Management: Real-time updates with workspace isolation
  • 3D Generation: AI-powered material visualization
  • Admin Dashboard: Comprehensive monitoring and analytics

Platform Architecture

Technology Stack

Frontend:

  • React 18 + TypeScript + Vite
  • Shadcn/ui + TailwindCSS
  • Deployed on Vercel Edge Network
  • Real-time updates via Supabase subscriptions

Backend:

  • MIVAA API: FastAPI + Python 3.11
  • 74+ REST API endpoints
  • Docker containerized
  • Self-hosted on dedicated server

Database:

  • Supabase PostgreSQL 15
  • pgvector extension for similarity search
  • Row-Level Security (RLS) for multi-tenancy
  • 30+ Edge Functions (TypeScript/Deno)

AI Services:

  • OpenAI (GPT-4o, text-embedding-3-small)
  • Anthropic (Claude Sonnet 4.5, Claude Haiku 4.5)
  • Together AI (Llama 4 Scout 17B Vision)
  • CLIP (Visual embeddings)
  • Replicate (Image generation)

System Flow

User uploads PDF → Frontend (React)
    ↓
Supabase Edge Function (mivaa-gateway)
    ↓
MIVAA API (FastAPI) → Creates background job
    ↓
14-Stage Processing Pipeline:
  1. PDF Analysis (PyMuPDF4LLM)
  2. Product Discovery (Claude Haiku → Sonnet)
  3. Text Extraction (focused on product pages)
  4. Semantic Chunking (Anthropic)
  5. Text Embeddings (OpenAI 1536D)
  6. Image Extraction
  7. Image Analysis (Llama Vision)
  8. CLIP Embeddings (512D)
  9. Product Creation (Two-stage AI)
  10. Metafield Extraction
  11. Deferred AI Analysis (async)
  12. Specialized Embeddings (color, texture, application)
  13. Quality Validation (Claude)
  14. Cleanup & Completion
    ↓
Data stored in Supabase → Available for search
    ↓
Real-time updates → Frontend displays results

AI Models & Intelligence

12 AI Models Across 7 Pipeline Stages

1. Anthropic Claude Models

Claude Sonnet 4.5 (Premium Tier):

  • Use Cases: Deep product analysis, complex metadata extraction, quality validation
  • Context: 200,000 tokens
  • Performance: Highest accuracy for complex reasoning
  • Pipeline Stages: Product Discovery (Stage 2), Deferred AI Analysis (Stage 11)

Claude Haiku 4.5 (Mid Tier):

  • Use Cases: Fast content classification, product boundary detection
  • Context: 200,000 tokens
  • Performance: 3x faster than Sonnet, 90% accuracy
  • Pipeline Stages: Product Discovery (Stage 1), Content Classification

2. OpenAI Models

GPT-4o:

  • Use Cases: Product discovery, conversational AI, complex reasoning
  • Context: 128,000 tokens
  • Performance: High accuracy, multimodal capabilities
  • Pipeline Stages: Product Discovery (alternative to Claude)

text-embedding-3-small:

  • Use Cases: Text chunk embeddings, semantic search
  • Dimensions: 1536
  • Performance: 62.3% MTEB score
  • Cost: $0.02 per 1M tokens
  • Pipeline Stages: Text Embedding Generation (Stage 5)

3. Together AI – Llama 4 Scout 17B Vision

  • Parameters: 17 billion
  • Modality: Vision + Text
  • Use Cases: Material image analysis, product classification, OCR
  • Performance:
    • 69.4% MMMU (Massive Multitask Multimodal Understanding)
    • #1 ranked for OCR tasks
    • 85%+ accuracy on material recognition
  • Cost: $0.30 per 1M tokens
  • Pipeline Stages: Image Analysis (Stage 7), Material Recognition

4. CLIP (OpenAI)

  • Model: Vision Transformer Base 32
  • Dimensions: 512
  • Use Cases: Visual embeddings, image-text similarity, visual search
  • Performance: Industry standard for visual embeddings
  • Cost: Free (self-hosted)
  • Pipeline Stages: Image Embedding Generation (Stage 8)

5. Replicate Models

Stable Diffusion XL: 3D texture generation, material visualization
FLUX-Schnell: Fast image generation, material previews

Multi-Vector Embeddings (6 Types)

The platform generates 6 types of embeddings for comprehensive search:

  1. Text Embeddings (1536D) – OpenAI text-embedding-3-small
  2. Visual CLIP Embeddings (512D) – CLIP ViT-B/32
  3. Color Embeddings (256D) – Custom color analysis
  4. Texture Embeddings (256D) – Custom texture analysis
  5. Application Embeddings (512D) – Use-case classifier
  6. Multimodal Embeddings (2048D) – Combined text + visual

PDF Processing Pipeline (14 Stages)

Stage-by-Stage Breakdown

Stage 1: PDF Upload & Validation

  • File validation (size, type, corruption)
  • Upload to Supabase Storage
  • Create document record

Stage 2: Background Job Creation

  • Create background_jobs record
  • Initialize progress tracking
  • Return job_id to frontend

Stage 3: PDF Analysis

  • Extract PDF metadata (pages, size, structure)
  • Analyze document type
  • Select processing strategy
  • Checkpoint: PDF_EXTRACTED

Stage 4: Product Discovery (AI)

  • Claude Haiku 4.5: Fast product identification (5-15 seconds)
  • Identify product count and page ranges
  • Claude Sonnet 4.5: Validate and enrich metadata (10-30 seconds)
  • Extract product names, dimensions, variants, designers
  • Output: Product list with page ranges (95%+ accuracy)

Stage 5: Text Extraction (Focused)

  • PyMuPDF4LLM: Extract text from product pages only
  • Preserve structure and formatting
  • Extract metadata (fonts, colors, layout)

Stage 6: Semantic Chunking (AI)

  • Anthropic Chunking API: Split text semantically
  • Max tokens: 800, Overlap: 100
  • Preserve context and meaning
  • Create document_chunks records
  • Checkpoint: CHUNKS_CREATED

Stage 7: Text Embedding Generation (AI)

  • OpenAI text-embedding-3-small: Generate 1536D embeddings
  • Store in pgvector for similarity search
  • Link embeddings to chunks
  • Checkpoint: TEXT_EMBEDDINGS_GENERATED

Stage 8: Image Extraction & Upload

  • Extract images from product pages
  • Upload to Supabase Storage (pdf-tiles bucket)
  • Create document_images records
  • Extract image metadata (dimensions, format)
  • Checkpoint: IMAGES_EXTRACTED

Stage 9: Image Analysis (AI)

  • Llama 4 Scout 17B Vision: Analyze each image (1-3 seconds)
  • Extract material properties
  • Quality scoring (0-100)
  • Classify image type (product, detail, mood, diagram)

Stage 10: CLIP Embedding Generation (AI)

  • CLIP ViT-B/32: Generate 512D visual embeddings (50-150ms per image)
  • Store in database for visual search
  • Link to document_images
  • Checkpoint: IMAGE_EMBEDDINGS_GENERATED

Stage 11: Product Creation (Two-Stage AI)

  • Stage 1: Content classification (product/supporting/administrative)
  • Stage 2: Product boundary detection
  • Create products records with metadata
  • Link chunks and images to products
  • Checkpoint: PRODUCTS_CREATED

Stage 12: Metafield Extraction

  • Extract dynamic metadata from chunks
  • Create metafield_values records
  • Link to chunks, products, images

Stage 13: Deferred AI Analysis (Async Background Job)

  • Claude Sonnet 4.5: Validate low-scoring images
  • Generate specialized embeddings:
    • Color embeddings (256D)
    • Texture embeddings (256D)
    • Application embeddings (512D)
    • Multimodal embeddings (2048D)
  • Enhanced metadata extraction

Stage 14: Cleanup & Completion

  • Delete temporary files from disk
  • Kill background processes
  • Update job status to ‘completed’
  • Send completion notification

Processing Performance

PDF SizePagesProductsTimeAccuracy
Small1-201-51-2 min95%+
Medium21-506-152-4 min95%+
Large51-10016-304-8 min95%+
Extra Large100+30+8-15 min95%+

Benchmark: Harmony PDF extracts 14+ distinct products with complete metadata (product names, dimensions, designers, page ranges, variants, image types).

Checkpoint Recovery System

The pipeline includes 9 checkpoints for recovery on failure:

  1. PDF_EXTRACTED
  2. CHUNKS_CREATED
  3. TEXT_EMBEDDINGS_GENERATED
  4. IMAGES_EXTRACTED
  5. IMAGE_EMBEDDINGS_GENERATED
  6. PRODUCTS_CREATED
  7. METAFIELDS_EXTRACTED
  8. DEFERRED_ANALYSIS_QUEUED
  9. COMPLETED

On job restart, the system resumes from the last completed checkpoint, avoiding redundant processing.


Search & Discovery

Multi-Vector Search System

The platform uses 6 embedding types for comprehensive search:

Semantic Search (Text):

  • Query: “sustainable wood materials”
  • Embedding: OpenAI text-embedding-3-small (1536D)
  • Similarity: Cosine similarity via pgvector
  • Accuracy: 85%+

Visual Search (Images):

  • Query: Upload image or describe visually
  • Embedding: CLIP ViT-B/32 (512D)
  • Similarity: Visual similarity matching
  • Accuracy: 88%+

Hybrid Search (Combined):

  • Query: Text + Image + Filters
  • Weights: Configurable (e.g., 60% semantic, 40% keyword)
  • Ranking: Multi-factor scoring
  • Accuracy: 90%+

Specialized Search:

  • Color-based: Find materials by color palette
  • Texture-based: Find similar textures
  • Application-based: Find materials for specific use cases

Search Performance

  • Response Time: 200-800ms
  • Accuracy: 85%+
  • Concurrent Users: 5,000+
  • Throughput: 1000+ queries/minute

Database Architecture

Core Tables

workspaces: Multi-tenant workspace management
documents: PDF documents and metadata
document_chunks: Semantic text chunks with 1536D embeddings
document_images: Extracted images with 512D CLIP embeddings
products: Product records from PDFs
background_jobs: Async job tracking with checkpoint recovery
material_metadata_fields: Dynamic metafield definitions
metafield_values: Metafield data for chunks/products/images

Storage Buckets

pdf-documents: Original PDF files (50MB max)
pdf-tiles: Extracted images (10MB max)
material-images: Material photos (10MB max)
3d-models: Generated 3D models (100MB max)

Security

Row-Level Security (RLS): All tables protected
Workspace Isolation: Users only access their workspace data
JWT Authentication: Supabase Auth with automatic token refresh
Encryption: At rest and in transit


Frontend Features

User-Facing Features

Dashboard: Metrics, feature grid, quick actions
PDF Processing: Drag-and-drop upload with real-time progress
Materials Catalog: Searchable, filterable product catalog
Search Hub: AI-powered semantic search
Material Recognition: Upload images for material identification
3D Generation: AI-powered material visualization
Mood Boards: Create and share material collections
Shopping Cart: Quote requests and commission tracking

Admin Features

Knowledge Base Management: View/edit chunks, images, products
PDF Processing Monitor: Real-time job tracking
AI Metrics Dashboard: Model usage and cost tracking
Quality Dashboard: Chunk quality and embedding stability
System Performance: Response times, error rates, uptime
User Management: Workspace members and permissions
Async Job Queue Monitor: Background job status and recovery


API Ecosystem

74+ REST API Endpoints

Categories:

  1. RAG & Document Processing (15 endpoints)
  2. Search APIs (6 endpoints)
  3. Embedding APIs (5 endpoints)
  4. Products API (8 endpoints)
  5. Images API (6 endpoints)
  6. AI Services (7 endpoints)
  7. Background Jobs (5 endpoints)
  8. Admin & Monitoring (12 endpoints)
  9. Document Management (13 endpoints)
  10. Anthropic APIs (3 endpoints)
  11. Together AI APIs (3 endpoints)
  12. Health & Monitoring (4 endpoints)

Documentation:

  • Swagger UI: https://v1api.materialshub.gr/docs
  • ReDoc: https://v1api.materialshub.gr/redoc
  • OpenAPI Schema: https://v1api.materialshub.gr/openapi.json

Production Metrics

Performance

  • Uptime: 99.5%+
  • Users: 5,000+
  • Search Response: 200-800ms
  • PDF Processing: 1-15 minutes (size-dependent)
  • Concurrent Jobs: Unlimited queue

Accuracy

  • Product Detection: 95%+
  • Search Accuracy: 85%+
  • Material Recognition: 90%+
  • Image Classification: 88%+

Scalability

  • Database: Auto-scaling with connection pooling
  • Frontend: Global Edge Network (Vercel)
  • API: Docker containerized with horizontal scaling
  • Storage: Unlimited (Supabase)

Last Updated: 2025-10-31
Version: 2.0.0
Status: Production
Users: 5,000+
Uptime: 99.5%+