Memori

Memory that performs

Memory that performs

We benchmarked Memori against leading memory frameworks on LoCoMo. Memori wins on accuracy without sacrificing costs.

Our key metrics

Memori gives you production grade accuracy at a fraction of the cost, sitting within only 6 points of full-context.

81.95%

Outperforms Zep, LangMem, and Mem0.

95%

Less tokens compared to full-context approach.

What is LoCoMo?

The Long Conversation Memory (LoCoMo) benchmark was built specifically to test an agent's ability to track, retain, and synthesize information across multi-session chat histories.

Single-hop

Direct recall of a specific fact from a single point in the conversation.

Multi-hop

Connecting facts across multiple sessions to answer compound questions.

Temporal

Tracking how a user's situation has evolved across sessions over time.

Open-domain

Broad questions that require synthesizing scattered context across many sessions.

Overall accuracy scores

We ran the same benchmark across four systems with the same judge. Here's how they ranked.

Built for production efficiency

Token cost compounds fast in long-running agents. Memori keeps context lean by design using 1,294 tokens per query vs 26,000 for full context.

Accuracy without the token costs

Memori reaches near full-context accuracy while keeping token usage lean and predictable.

A fraction of the cost

Memori uses 1,294 tokens per query vs. 26,000 for full-context. At scale, that's a 20x reduction in inference costs, roughly $0.001 per call on GPT-4.1-mini. For long-running agents, that difference compounds fast.

Minimal context footprint

Larger context windows don't just cost more, they increase the risk of "lost in the middle" hallucinations. Memori keeps context to 5% of full-context, keeping responses reliable as conversations grow.

Dual-layered memory

Semantic triples capture exact facts for precise recall while conversation summaries provide the narrative flow. Each triple links back to the summary it came from, so granular facts are never divorced from their broader context.

Facts, not noise

Memori feeds the LLM exact facts with no surrounding noise, effectively isolating high-signal knowledge. This precision drives an 81.95% overall score, within 6 points of passing the full conversation history.

How Memori stacks up

We compared the factual accuracy and reasoning capabilities of Memori configurations against state-of-the-art baselines and a full-context ceiling.

How we got there

Each question was answered using GPT-4.1-mini, conditioned on facts and summaries retrieved from Memori. We utilized an LLM-as-a-Judge methodology to provide an assessment across four dimensions: factual accuracy, relevance, completeness, and contextual appropriateness.

The architecture behind the score

Step 1

Session input

New messages from the current conversation session are continuously fed into the Advanced Augmentation engine.

Step 2

Summary loop

The system maintains an evolving summary of the conversation, feeding context back and forth for processing.

Step 3

Memory extraction

New facts are extracted and stored as memories with the updated summary in the memory database.

Don't take our word for it? Run it yourself.

Conduct the same tests, use the same judge, and reproduce the results yourself.