How Memori Works
Memori gives your AI application long-term memory. Instead of forgetting everything after each conversation, your AI can remember facts, preferences, and context across sessions and across different applications — all stored in your own database.
Attribution
Every memory in Memori is tagged with three pieces of information: who, what, and when.
- Entity (
entity_id) — The person, place, or thing generating memories. Typically a user ID (e.g.,"user_alice","company_acme"). - Process (
process_id) — The agent, program, or workflow creating memories (e.g.,"support_bot","code_review_agent"). - Session (
session_id) — Groups related LLM interactions into a conversation thread. Auto-generated as a UUID by default.
The combination of entity_id + process_id + session_id creates a unique memory scope — different users have isolated memories, the same user can have different context in different applications, and each conversation is tracked separately.
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from memori import Memori
from openai import OpenAI
engine = create_engine("sqlite:///memori.db")
SessionLocal = sessionmaker(bind=engine)
client = OpenAI()
mem = Memori(conn=SessionLocal).llm.register(client)
# Set attribution before any LLM calls
mem.attribution(
entity_id="user_alice",
process_id="support_bot"
)
# session_id is auto-generated
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "user", "content": "I prefer dark mode."}
]
)
Memory Types
When you have a conversation through a Memori-wrapped LLM client, Advanced Augmentation extracts structured memories in the background:
| Type | What it captures | Example |
|---|---|---|
| Facts | Objective information | "User uses PostgreSQL for production databases" |
| Preferences | Choices, opinions, tastes | "Prefers concise answers" |
| Skills | Abilities and expertise | "Experienced with React (5 years)" |
| Rules | Constraints and principles | "Follows test-driven development" |
| Events | Milestones and occurrences | "Product launched recently" |
How Recall Works
Recall brings stored memories back into your AI conversations. There are two modes.
Automatic Recall (Default)
On every LLM call, Memori automatically:
- Intercepts the outbound request
- Uses semantic search to find relevant facts for the current entity
- Injects the most relevant memories into the system prompt
- Forwards the enriched request to the LLM
No extra code required — it happens transparently.
Manual Recall
Use mem.recall() when you want to retrieve memories explicitly — for custom prompts, displaying memories in a UI, or debugging.
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from memori import Memori
engine = create_engine("sqlite:///memori.db")
SessionLocal = sessionmaker(bind=engine)
mem = Memori(conn=SessionLocal)
mem.attribution(entity_id="user_alice", process_id="support_bot")
facts = mem.recall("coding preferences", limit=5)
for fact in facts:
print(f"Fact: {fact.content}")
print(f"Score: {fact.similarity:.4f}")
Each returned fact includes id, content, similarity (0–1 relevance score), rank_score, and date_created.
Recall Configuration
Memori uses the all-mpnet-base-v2 sentence transformer with cosine similarity for semantic search. You can tune recall behavior with these configuration options:
| Option | Default | Description |
|---|---|---|
mem.config.recall_relevance_threshold | 0.1 | Minimum similarity score for a fact to be included |
mem.config.recall_embeddings_limit | 1000 | Maximum number of embeddings to compare against |
# Example: tune recall for broader or narrower results
mem.config.recall_relevance_threshold = 0.05 # Lower = more results
mem.config.recall_embeddings_limit = 500 # Reduce for lower memory usage
Memory Lifecycle
- Conversation — Your user talks to your AI through the wrapped LLM client
- Capture — Memori intercepts and stores the raw conversation in your database
- Augmentation — Advanced Augmentation processes the conversation asynchronously
- Extraction — Facts, preferences, skills, rules, and events are identified
- Storage — Extracted memories are stored in your database with vector embeddings
- Recall — On the next LLM call, relevant memories are retrieved and injected into context