Building an Intelligent Document Chatbot with Kernel Memory

In today’s data-driven world, organizations are increasingly looking for ways to leverage their vast repositories of documents to enhance productivity and decision-making. Microsoft’s Kernel Memory (KM) is a powerful tool that enables developers to build sophisticated AI-driven document chat experiences. In this blog post, we’ll explore what Kernel Memory is, how it works, and how you can use it to build a Document AI Chat experience.

What is Kernel Memory?

Kernel Memory is a multi-modal AI service designed for efficient indexing of datasets through custom continuous data hybrid pipelines. It supports Retrieval Augmented Generation (RAG), synthetic memory, prompt engineering, and custom semantic memory processing. KM is available as a web service, Docker container, plugin for ChatGPT/Copilot/Semantic Kernel, and as a .NET library for embedded applications.

Key Features of Kernel Memory

  • Retrieval Augmented Generation (RAG): Enhances the ability to retrieve relevant information from indexed data and generate responses based on that information.
  • Security Filters: Allows filtering memory by users and groups, ensuring controlled and secure data access.
  • Long Running Ingestion: Supports long-running ingestion processes for large documents, complete with retry logic and durable queues for reliable data processing.

How Kernel Memory Works

Kernel Memory utilizes advanced embeddings and Large Language Models (LLMs) to enable natural language querying. It extracts text from various document formats, partitions the text into smaller chunks, and generates embeddings using an LLM. These embeddings are then stored in a vector index, such as Azure AI Search, allowing for efficient search and retrieval.

Data Ingestion Pipeline

  1. Extract Text: Recognize the file format and extract information.
  2. Partition Text: Divide the text into small chunks for optimized search.
  3. Generate Embeddings: Use an LLM to create embeddings from the text.
  4. Store Embeddings: Save the embeddings in a vector database for future retrieval.

Kernel Memory

Synchronous Memory API (Serverless Mode)

Kernel Memory performs optimally when running as an asynchronous web service, allowing for the ingestion of thousands of documents without blocking the application. However, it can also operate in a serverless mode by embedding the MemoryServerless class instance in .NET backend, console, or desktop applications in synchronous mode. This approach is effective in ASP.NET Web APIs and Azure Functions, where each request is processed immediately, though clients must handle transient errors.

Synchronous Memory API

Memory as a Service - Asynchronous API

Depending on your needs, you might choose to run the code locally within your process or remotely through an asynchronous and scalable service. For small file imports that only require C# and can block the process during import, local in-process execution using the MemoryServerless class is sufficient.

However, if you need to:

  • Import data and send queries via a web service.
  • Use languages like TypeScript, Java, Rust, or others.
  • Import large documents that take minutes to process without blocking the user interface.
  • Run memory import independently, supporting failures and retries.
  • Define custom pipelines using multiple languages like Python, TypeScript, etc.

Then deploying Kernel Memory as a backend service is ideal. This setup allows you to use default handlers or custom handlers in various languages and leverage the asynchronous, non-blocking memory encoding process. You can send documents and ask questions using the MemoryWebClient.

Memory as a Service

For detailed instructions on running the Kernel Memory service, refer to the official documentation.

Kernel Memory (KM) and Semantic Memory (SM)

Kernel Memory (KM) is a service built from the feedback and lessons learned from developing Semantic Kernel (SK) and Semantic Memory (SM). It offers features like file storage, text extraction, and a framework for securing user data, all in a .NET codebase. KM can be used from any language, tool, or platform, including browser extensions and ChatGPT assistants.

Semantic Memory (SM) is a library for C#, Python, and Java that supports vector search and wraps direct database calls. Developed as part of the Semantic Kernel (SK) project, it serves as the first public iteration of long-term memory. The core library is maintained in three languages, with varying support for storage engines (connectors).

Here’s a comparison table:

Feature Kernel Memory Semantic Memory
Data formats Web pages, PDF, Images, Word, PowerPoint, Excel, Markdown, Text, JSON, HTML Text only
Search Cosine similarity, Hybrid search with filters (AND/OR conditions) Cosine similarity
Language support Any language, command line tools, browser extensions, low-code/no-code apps, chatbots, assistants, etc. C#, Python, Java
Storage engines Azure AI Search, Elasticsearch, MongoDB Atlas, Postgres+pgvector, Qdrant, Redis, SQL Server, In memory KNN, On disk KNN Azure AI Search, Chroma, DuckDB, Kusto, Milvus, MongoDB, Pinecone, Postgres, Qdrant, Redis, SQLite, Weaviate
File storage Disk, Azure Blobs, AWS S3, MongoDB Atlas, In memory (volatile) -
RAG Yes, with sources lookup -
Summarization Yes -
OCR Yes, via Azure Document Intelligence -
Security Filters Yes -
Large document ingestion Yes, including async processing using queues (Azure Queues, RabbitMQ, File-based, or In-memory queues) -
Document storage Yes -
Custom storage schema Some DBs -
Vector DBs with internal embedding Yes -
Concurrent write to multiple vector DBs Yes -
LLMs Azure OpenAI, OpenAI, Anthropic, Ollama, LLamaSharp, LM Studio, Semantic Kernel connectors Azure OpenAI, OpenAI, Gemini, Hugging Face, ONNX, custom ones, etc.
LLMs with dedicated tokenization Yes No
Cloud deployment Yes -
Web service with OpenAPI Yes -

Building a Document AI Chat Experience - Docu Genie

To build a Document AI Chat experience, we’ll use an Azure Function as the backend in combination with a SharePoint Framework (SPFx) web part. The SPFx web part will serve as the user interface, allowing users to upload documents and ask questions.

The Azure Function will utilize Kernel Memory to upload documents to Blob Storage, extract and transform data, manage semantic memory, implement Retrieval-Augmented Generation (RAG), and provide answers with citations using Kernel Memory.

Architecture Overview

  1. SPFx Web Part: The user interface where users can upload documents and ask questions.
  2. Azure Function App: Manages document ingestion, indexing, and querying using Kernel Memory to upload documents to Blob Storage, extract and transform data, manage semantic memory, implement Retrieval-Augmented Generation (RAG), and provide answers with citations.

Docu Genie

Demo

Docu Genie

Kernel Memory Service

The Kernel Memory Service initializes Kernel Memory to manage document ingestion, indexing, and querying. It handles importing documents, extracting data, and answering user queries with accurate, citation-backed responses.

Conclusion

Microsoft Kernel Memory is a powerful tool for building AI-driven document chat experiences. By leveraging its advanced features like RAG, security filters, and long-running ingestion, developers can create robust applications that allow users to interact with their documents in natural language. Whether you’re building a simple Q&A system or a complex chat experience, Kernel Memory provides the tools you need to get started.

Source Code

Note

You can find the complete source code from react-document-ai-chat.

References

Kernel Memory Documentation

Azure AI Search

SharePoint Framework (SPFx)

Author: Ejaz Hussain
Link: https://office365clinic.com/2025/03/09/building-document-chat-using-kernel-memory/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.