How Memori Works
Memori gives your AI application long-term memory. Instead of forgetting everything after each conversation, your AI can remember facts, preferences, and context across sessions and across different applications. Agent trace & execution memories are captured via integrations such as OpenClaw, Hermes and Claude Code.
Attribution
Every memory in Memori is tagged with three dimensions: who (entity), what (process), and which conversation (session).
- Entity (
entity_id) — The person, place, or thing generating memories. Typically a user ID (e.g.,"user_alice","company_acme"). - Process (
process_id) — The agent, program, or workflow creating memories (e.g.,"support_bot","code_review_agent"). - Session (
session_id) — Groups related LLM interactions into a conversation thread. Auto-generated as a UUID by default.
The combination of entity_id + process_id + session_id creates a unique memory scope — different users have isolated memories, the same user can have different context in different applications, and each conversation is tracked separately.
from memori import Memori
from openai import OpenAI
client = OpenAI()
mem = Memori().llm.register(client)
# Set attribution before any LLM calls
mem.attribution(
entity_id="user_alice",
process_id="support_bot"
)
# session_id is auto-generated
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "user", "content": "I prefer dark mode."}
]
)
Memory Types
When you have a conversation through a Memori-wrapped LLM client, Advanced Augmentation extracts structured memories in the background. Agent trace & execution memories are captured via integrations such as OpenClaw, Hermes and Claude Code, which send tool calls, decisions, and outcomes directly to Memori:
| Type | What it captures | Example |
|---|---|---|
| Facts | Objective information with embeddings | "User uses PostgreSQL for production databases" |
| Preferences | Choices, opinions, and tastes | "Prefers concise answers" |
| Skills & Knowledge | Abilities and expertise levels | "Experienced with React (5 years)" |
| Attributes | Process-level information about the agent | "Handles billing and subscription queries" |
| Agent Trace & Execution | Tool calls, decisions, workflow steps, and outcomes | "Used search tool → found result → summarized" |
How Recall Works
Recall brings stored memories back into your AI conversations. There are two modes.
Automatic Recall (Default)
On every LLM call, Memori automatically:
- Intercepts the outbound request
- Uses semantic search to find relevant facts for the current entity
- Injects the most relevant memories into the system prompt
- Forwards the enriched request to the LLM
No extra code required — it happens transparently.
Manual Recall
Use mem.recall() to retrieve memories explicitly — useful for building custom prompts, displaying memories in a UI, or debugging.
from memori import Memori
mem = Memori()
mem.attribution(entity_id="user_alice", process_id="support_bot")
facts = mem.recall("coding preferences", limit=5)
for fact in facts:
print(f"Fact: {fact.content}")
print(f"Score: {fact.similarity:.4f}")
Each returned fact includes id, content, similarity (0–1 relevance score), rank_score, and date_created.
Recall Configuration
Memori uses semantic search (vector similarity) to find relevant facts. You can tune recall behavior with:
| Option | Default | Description |
|---|---|---|
mem.config.recall_relevance_threshold | 0.1 | Minimum similarity score for a fact to be included |
mem.config.recall_embeddings_limit | 1000 | Maximum number of embeddings to compare against |
# Example: tune recall for broader or narrower results
mem.config.recall_relevance_threshold = 0.05 # Lower = more results
mem.config.recall_embeddings_limit = 500 # Reduce for lower memory usage
Memory Lifecycle

- Conversation — Your user talks to your AI through the wrapped LLM client
- Capture — Memori intercepts and stores the raw conversation
- Augmentation — Advanced Augmentation processes the conversation asynchronously, extracting structured memories
- Extraction — Facts, preferences, skills, attributes, and agent trace & execution memories are identified
- Storage — Extracted memories are stored in Memori Cloud with vector embeddings
- Recall — On the next LLM call, relevant memories are retrieved and injected into context