Episode 15

RAG-Based Agentic Memory in AI (Chapter 17)

Unlock how RAG-based agentic memory is transforming AI from forgetful chatbots into intelligent assistants that remember and adapt. In this episode, we break down the core concepts from Chapter 17 of Keith Bourne’s “Unlocking Data with Generative AI and RAG,” exploring why memory-enabled AI is a game changer for customer experience and operational efficiency.

In this episode, you’ll learn:

- What agentic memory means in AI and why it matters for leadership strategy

- The difference between episodic and semantic memory and how they combine

- Key tools like CoALA, LangChain, and ChromaDB that enable memory-enabled AI

- Real-world applications driving business value across industries

- The trade-offs and governance challenges leaders must consider

- Actionable tips for adopting RAG-based memory systems today

Key tools and technologies: CoALA, LangChain, ChromaDB, GPT-4, vector embeddings

Timestamps:

00:00 – Introduction and overview

02:30 – The AI memory revolution: episodic and semantic memory explained

07:15 – Why now: Technology advances driving adoption

10:00 – Comparing memory approaches: stateless vs episodic vs combined

13:30 – Under the hood: architecture and workflow orchestration

16:00 – Real-world impact and business benefits

18:00 – Risks, challenges, and governance

19:30 – Practical leadership takeaways and closing

Resources:

- "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition

- Memriq.ai – Tools and resources for AI practitioners and leaders

Thanks for listening to Memriq Inference Digest - Leadership Edition.

Transcript

MEMRIQ INFERENCE DIGEST - LEADERSHIP EDITION Episode: RAG-Based Agentic Memory in AI: Chapter 17 Deep Dive for Leaders

MORGAN: 00:00

Hello and welcome to Memriq Inference Digest - Leadership Edition, your go-to podcast for strategic insights on AI’s evolving landscape. I’m Morgan, and we’re brought to you by Memriq AI—a content studio building tools and resources for AI practitioners. Check them out at Memriq.ai.

CASEY: 00:20

Today, we’re diving into something that’s really reshaping how AI interacts with users—RAG-based agentic memory in AI agents. We’ll be exploring the concepts from Chapter 17 of ‘Unlocking Data with Generative AI and RAG’ by Keith Bourne.

MORGAN: 00:38

That’s right. And if you’re craving more than what we cover here—detailed diagrams, thorough explanations, and hands-on code labs—you can find the 2nd edition of Keith’s book on Amazon. It’s a treasure trove for those wanting to get their hands dirty.

CASEY: 00:52

We’re also excited to have Keith himself joining us throughout the episode. Keith will share insider insights, behind-the-scenes thinking, and real-world experience that you won’t find in just any summary.

MORGAN: 01:05

So, what’s coming up? We’ll start with a surprising fact about AI remembering past interactions, unpack the core ideas behind agentic memory, compare tools like CoALA, LangChain, and ChromaDB, and then get into the what, why, and how of this game-changing technology. Plus, a lively tech debate and practical tips for leaders wanting to adopt this now.

CASEY: 01:22

Let’s get started.

JORDAN: 01:25

Here’s a fact that blew me away: AI agents are no longer just chatbots that answer questions—they can now remember entire past conversations and factual knowledge. This isn’t just storing data; it’s like giving AI a memory, enabling it to act more like a human assistant that learns and adapts over time.

MORGAN: 01:45

Wait—so these AI agents remember what you told them last week, or even last month? That’s a huge leap from the usual “stateless” bots that forget everything after one chat.

CASEY: 01:56

Absolutely, and it’s not just about remembering conversations. They can also recall facts extracted from those interactions—things like your preferences or past issues—making responses far more personalized and accurate.

JORDAN: 02:08

Exactly. Combining episodic memory—which is like remembering events—and semantic memory—remembering facts—means the AI improves with every interaction, transforming customer experiences and boosting operational efficiency.

MORGAN: 02:20

That could be a game-changer for customer service, sales, even internal support. Personalized, context-aware AI that learns continuously… That’s a big competitive advantage.

CASEY: 02:31

But it also raises questions around data management and complexity—something we’ll unpack later.

CASEY: 02:36

If you remember only one thing today, it’s this: RAG-based agentic memory equips AI agents with the ability to recall both past conversations and factual knowledge, enabling smarter, more context-aware responses.

MORGAN: 02:47

We’re talking about three main tools here—CoALA for orchestrating AI agents, LangChain for managing language workflows, and ChromaDB as a vector database storing memories for fast retrieval.

CASEY: 02:58

The takeaway? Memory-enabled AI moves beyond reactive chats to proactive, personalized interactions that can really drive business value.

JORDAN: 03:07

The big shift is driven by rising customer expectations. Today’s users want seamless, personalized experiences across every interaction. If your AI forgets previous chats, users get frustrated repeating themselves.

MORGAN: 03:18

Before, AI assistants were often “stateless,” meaning they treated each interaction in isolation with no memory of past conversations—a bit like talking to a new person every time. That doesn’t cut it anymore.

JORDAN: 03:29

Advances in vector databases like ChromaDB and language models such as OpenAI’s GPT-4 have changed the game. Vector databases store what we call “embeddings”—think of these as digital fingerprints of information—that let AI find relevant memories quickly, even in huge amounts of data.

CASEY: 03:44

So instead of losing context, AI agents now have persistent memory—storing and recalling past information over time. This isn’t just a nice-to-have anymore; it’s essential to meet customer needs and stay competitive.

JORDAN: 03:55

Exactly. Industries from retail to education are adopting this approach because it unlocks new levels of personalization and efficiency. As the book points out, this is a tipping point for AI adoption.

MORGAN: 04:05

So it’s a mix of technology catching up with expectations—and businesses realizing the cost of ignoring memory in AI is too high.

TAYLOR: 04:13

Let’s break down this core concept. Agentic memory in AI means the agent remembers and uses past interactions and facts to inform future responses. There are two memory types at play: episodic and semantic.

MORGAN: 04:24

Episodic memory is like a diary—the agent recalls entire past conversations and events. Semantic memory, on the other hand, is more like a factbook—it stores structured knowledge extracted from conversations, such as “customer prefers premium shipping” or “issue resolved on last call.”

TAYLOR: 04:39

The real magic happens with Retrieval-Augmented Generation—RAG—where the AI retrieves these memories and combines them with generative capabilities, like GPT-4, to produce informed, context-aware answers.

MORGAN: 04:50

Taylor, Keith, as the author, what made this concept so important to cover early in your book?

KEITH: 04:55

Great question, Morgan. I saw early on that AI’s future wasn’t just about generating text, but about grounding that generation in real, relevant knowledge and experience. RAG with agentic memory makes AI more like a colleague who remembers what you talked about last time and understands the facts behind your business. That’s a leap from generic chatbots to truly intelligent agents.

TAYLOR: 05:15

It fundamentally shifts AI from stateless responders to evolving assistants that learn from interactions. The book highlights how this mimics human cognition—combining event recall and fact knowledge to adapt over time.

MORGAN: 05:27

And that’s why frameworks like CoALA are designed—to orchestrate these modular memory components seamlessly.

KEITH: 05:33

Exactly. Without that orchestration, managing different memory types and integrating them into workflows becomes chaotic. The book dives deeper into how architecture decisions impact scalability and performance.

TAYLOR: 05:43

Now, let’s compare approaches. We have stateless agents, episodic memory agents, semantic memory agents, and combined agents leveraging both. Casey, what’s your take here?

CASEY: 05:51

Stateless agents are simple and easy to deploy but frustrating because they forget everything after each interaction. Users repeat info, and the experience feels robotic and disconnected.

TAYLOR: 06:02

Right. Episodic memory agents improve continuity by recalling entire past conversations, so the AI remembers the “story” of the user interaction. This helps with customer satisfaction but doesn’t always ensure factual accuracy or personalization.

CASEY: 06:13

Then semantic memory agents focus on extracting and storing hard facts—think key data points and preferences. They personalize responses and ground answers in real knowledge but can lack conversational nuance and context.

TAYLOR: 06:24

The sweet spot, the book argues, is combining both episodic and semantic memory. Use episodic for temporal context and semantic for factual grounding. But this comes with complexity and cost.

CASEY: 06:33

So decision criteria could look like this: Use stateless if your interactions are simple, low-stakes, or high volume where cost is critical. Episodic memory fits customer support where conversation flow matters. Semantic memory suits knowledge-heavy domains needing factual accuracy. Combine both for premium experiences where personalization and context drive value.

TAYLOR: 06:52

Exactly, and frameworks like CoALA and LangChain help manage that complexity, making combined memory systems feasible.

MORGAN: 07:00

That’s a useful breakdown. Casey, any concerns about combining memories?

CASEY: 07:04

Yes—the risk of conflicting or outdated memories influencing responses is real. Managing data quality and privacy gets trickier as memory grows.

TAYLOR: 07:12

The book is upfront about those trade-offs and suggests governance frameworks.

ALEX: 07:18

Okay, let’s get into how this actually works—without getting too technical, promise! At the core, these systems use vector embeddings, which are like digital fingerprints representing pieces of text or data. By storing these in vector databases such as ChromaDB, the AI can quickly find the most relevant memories based on similarity—kind of like how Spotify recommends songs based on what you’ve listened to.

MORGAN: 07:38

So embeddings are how the AI measures “closeness” or relevance of past data?

ALEX: 07:42

Exactly. Now, the CoALA framework orchestrates the AI agent, modularizing episodic and semantic memory components. LangChain manages the language model workflows—essentially deciding when to query memories, when to generate responses, and how to update memories. LangGraph oversees the overall state and workflow orchestration.

KEITH: 07:59

Your book has extensive code labs on this. What’s the one thing you want readers to really internalize about this architecture?

KEITH: 08:04

It’s the idea that memory isn’t just stored data; it’s an active participant in the AI’s reasoning process. The flow—retrieval, conditioning prompts, response generation, and memory update—is continuous and tightly integrated. The code labs make this concrete by walking readers through building these loops step by step, which is critical to understanding how to scale and maintain the system.

ALEX: 08:23

That integration lets the agent both recall past conversations with timestamps—episodic memory—and extract structured facts in the form of subject-predicate-object triples for semantic memory. For example, “Customer - prefers - express shipping.” These facts get embedded and stored for fast retrieval.

MORGAN: 08:37

So when the AI answers a question, it pulls in relevant episodes and facts, weaving them into a response that feels informed and personalized.

ALEX: 08:43

Precisely. And because the memories are vectorized, the system can handle huge volumes of data while keeping latency low—critical for real-time user experiences.

CASEY: 08:52

But what about updating memories when facts change or new info comes in?

ALEX: 08:56

The agent continuously indexes new data, updating embeddings and pruning outdated info. That’s where workflow orchestration is key—deciding what to keep, what to forget, and when to verify facts.

KEITH: 09:06

The book’s labs provide patterns for managing that lifecycle—something many projects overlook until it’s too late.

ALEX: 09:12

The results here are impressive. Agents with episodic memory reduce customer frustration by recalling prior interactions—think of a support bot that remembers you called last week about a billing issue and picks up right where it left off. That continuity alone cuts resolution times dramatically.

MORGAN: 09:27

That’s a huge win—faster resolutions directly translate into cost savings and happier customers.

ALEX: 09:32

Semantic memory adds another layer—personalizing responses based on stored facts like preferences or previous solutions. For example, a tutor AI can tailor learning paths based on student knowledge gaps it has recorded.

CASEY: 09:43

Do we have any numbers?

ALEX: 09:45

Yes—the book mentions agents extracting 4 to 6 facts per conversation and recalling 2 to 3 past conversation episodes effectively. The combined system boosts relevancy and user satisfaction significantly.

MORGAN: 09:56

Wow, that’s real business impact. Reducing repeat explanations and personalizing service at scale—that’s gold.

CASEY: 10:02

Though it comes with cost and complexity, the operational efficiency gains and customer loyalty are worth it.

CASEY: 10:08

Now, let me play devil’s advocate. These memory systems aren’t without risks. First, complexity skyrockets—teams must manage data privacy, compliance, and ongoing data curation.

MORGAN: 10:18

So it’s not a “set it and forget it” solution?

CASEY: 10:21

Not at all. Memories can become outdated or incorrect, leading AI to hallucinate—that’s when the AI confidently states something false because it relied on bad data. That erodes trust quickly.

KEITH: 10:32

From my consulting work, the biggest mistake I see people make is underestimating the effort needed for quality control. Many jump into building memory-enabled agents without a solid governance process and end up with confusing or irrelevant responses. Maintaining freshness and accuracy is ongoing work—not a one-off task.

CASEY: 10:50

Scaling memory storage is another challenge. As conversations and facts grow, retrieval speed can slow, and costs rise.

MORGAN: 10:58

So leaders must balance business value with operational overhead.

KEITH: 11:02

Exactly. The book’s honest about these limitations and suggests best practices for mitigation.

CASEY: 11:08

And let’s not forget procedural memory—the ability for an AI to learn and optimize multi-step workflows—is still an open problem.

MORGAN: 11:15

So exciting, but plenty to think through before adoption.

SAM: 11:20

Let’s talk real-world applications. Customer service is a standout—bots remembering past issues reduce repeat explanations and speed resolutions. Companies are already rolling out memory-enabled agents to improve first-call fixes and customer satisfaction.

MORGAN: 11:34

What about education?

SAM: 11:36

Personalized tutors track student learning gaps and adapt content accordingly—far beyond generic lesson plans. They remember what the student struggled with last session, making learning more efficient.

CASEY: 11:46

And personal assistants?

SAM: 11:48

They recall meetings, preferences, and relationships to proactively suggest actions—like reminding you about a colleague’s birthday or prepping for a recurring meeting.

MORGAN: 11:57

Enterprise knowledge management?

SAM: 11:59

Huge opportunity. Organizations accumulate vast facts and conversations. Memory-enabled AI helps surface relevant information quickly, aiding decision-making and reducing research time.

CASEY: 12:09

What about sales and marketing?

SAM: 12:11

Bots tailor interactions based on historical customer data, improving conversion rates and customer loyalty.

MORGAN: 12:17

It’s clear this isn’t hypothetical—it’s happening now across multiple industries.

SAM: 12:22

Here’s a scenario: You’re a VP deciding on a chatbot strategy. Do you go stateless, episodic memory-enabled, or combined episodic-semantic memory? Morgan, take stateless.

MORGAN: 12:32

Stateless is quick to deploy, lower cost, and simpler. Perfect for high volume, low complexity queries where personalization isn’t crucial. But users will repeat info, and the experience feels canned.

SAM: 12:44

Casey, episodic memory?

CASEY: 12:46

Episodic memory bots improve continuity—they remember past conversations, making support smoother. But they might not fully understand the facts behind the chat, which limits personalization and accuracy.

SAM: 12:57

Taylor, combined memory?

TAYLOR: 12:59

Best user experience, hands down. The AI personalizes responses and maintains conversational context. But it’s more complex and costly to build and maintain. You need tools like CoALA and LangChain to orchestrate everything well.

SAM: 13:11

So, it boils down to your business goals. Need speed and scale? Stateless. Need continuity? Episodic. Need premium, personalized service? Combined, but plan for the complexity.

MORGAN: 13:22

Great framework for strategic decisions.

SAM: 13:25

Let’s round out with some actionable tips. Start by evaluating your use case—if you want memory, vector databases like ChromaDB are essential for efficient storage and retrieval of embeddings, which represent your knowledge.

CASEY: 13:37

Use frameworks like CoALA and LangChain to modularize your AI agent and manage workflows elegantly—don’t try to build monoliths.

SAM: 13:44

For episodic memory, store full conversation episodes with timestamps and metadata to maintain temporal context. For semantic memory, extract structured facts—things that can be stored as simple triples like “customer - likes - product X.”

MORGAN: 13:56

And don’t forget good security hygiene—manage API keys and environment variables carefully to protect sensitive data.

CASEY: 14:02

Design workflows that continuously retrieve relevant memories, generate responses, and update the memory store—making your agent smarter with every interaction.

SAM: 14:10

These patterns will help you avoid common pitfalls and build scalable, maintainable systems.

MORGAN: 14:15

Quick plug: We’re just scratching the surface here. Keith Bourne’s ‘Unlocking Data with Generative AI and RAG’ dives deep with detailed illustrations, thorough explanations, and hands-on code labs that guide you from theory to practice. If you want to truly master these concepts, grab the 2nd edition on Amazon.

MORGAN: 14:35

A quick shoutout to Memriq AI, who produce this podcast. They’re an AI consultancy and content studio building tools and resources for AI practitioners.

CASEY: 14:43

Memriq helps engineers and leaders stay current with this rapidly evolving AI landscape. For deep-dives, practical guides, and cutting-edge research breakdowns, head over to Memriq.ai.

SAM: 14:53

Before we wrap, what’s still unsolved? Procedural memory—enabling AI to learn and optimize multi-step workflows—is still in early stages.

MORGAN: 15:01

And integrating multi-modal data—text, voice, images—into a unified memory system is an ongoing challenge.

CASEY: 15:07

Data privacy and compliance remain top concerns. Storing conversational and factual data requires stringent governance.

KEITH: 15:14

Also, scaling memory systems without latency or cost blowouts is an active area of research. Improving retrieval accuracy to avoid misinformation is critical.

SAM: 15:23

Leaders should watch these trends closely—where the field heads next will impact competitive advantage.

MORGAN: 15:30

For me, the key takeaway is how agentic memory transforms AI from reactive tools into evolving partners that learn over time. That’s a strategic advantage.

CASEY: 15:39

I’d highlight the operational risks—without careful management, complexity and data quality issues can undermine benefits. So plan your governance early.

JORDAN: 15:47

I’m struck by the real-world impact—memory-enabled AI is already driving better customer experiences that no competitor can ignore.

TAYLOR: 15:54

The architectural modularity is a game changer. Using frameworks like CoALA lets you build scalable, maintainable systems rather than brittle one-offs.

ALEX: 16:01

I’m excited by the metrics—faster resolution times and personalized responses that translate directly into ROI. That’s how you sell this internally.

SAM: 16:08

The balance of technical innovation with practical deployment is crucial. Start small, iterate, and govern well.

KEITH: 16:15

As the author, my hope is you see memory-enabled AI not just as a tech trend, but as a paradigm shift—a way to build truly intelligent assistants that learn, adapt, and create lasting business value.

MORGAN: 16:28

Keith, thanks so much for giving us the inside scoop today.

KEITH: 16:32

My pleasure—and I hope this inspires you to dig into the book and build something amazing.

CASEY: 16:38

Remember, we covered the highlights here, but the book goes much deeper—with diagrams, exhaustive explanations, and hands-on labs that let you build these systems yourself.

MORGAN: 16:47

Search for Keith Bourne on Amazon and grab the 2nd edition of ‘Unlocking Data with Generative AI and RAG.’ Thanks for listening, and we’ll see you next time on Memriq Inference Digest - Leadership Edition.

Episode 15

RAG-Based Agentic Memory in AI (Chapter 17)

Transcript

About the Podcast

Listen for free

About your host

Memriq AI