Rethinking Memory as Continuously Evolving Connectivity
Static memory stores facts and retrieves them with a fixed pipeline. FluxMem treats memory as a heterogeneous graph whose topology keeps evolving, repairing missing links, pruning interference, and distilling recurring successes into reusable circuits.
Memory-augmented agents usually treat memory as a static repository: fixed representations, a fixed retrieval pipeline. That is brittle in dynamic environments where feedback, task variation, and heterogeneous signals keep reshaping what should be remembered and how it should connect. FluxMem models memory as a heterogeneous graph and progressively refines its topology through three stages: initial connection formation, feedback-driven refinement, and long-term consolidation.
During execution it repairs missing links, prunes interfering ones, aligns abstraction granularity, and distills recurrent successful trajectories into reusable procedural circuits, all guided by one metric of memory generalizability and evolutionary maturity. Across three very different benchmarks (LoCoMo, Mind2Web, GAIA) it reaches state-of-the-art, lifting LoCoMo average accuracy to 95.06 over a full-context baseline of 81.23.
The problem it attacks
The paper diagnoses two failures of static memory. First, inaccurate connection: under-connection misses critical links because retrieval is imprecise, depriving the agent of relevant associations, while over-connection retrieves loosely related memories indiscriminately, injecting noise and hallucination. A static pipeline cannot adapt its connections to the situation. Second, failure of connection consolidation: static systems append new experiences without truly integrating them, and they represent memory at a single fixed abstraction level that is either too coarse (losing execution detail) or too fine (drowning in it). True consolidation needs localized structural change, not just more rows in a store.
Memory effectiveness is a connectivity problem. What matters is whether the most useful memories are reachable at each decision step, so evolve the graph's topology, not just its contents.
How it works
FluxMem is a heterogeneous graph with three layers: a semantic layer sourced from raw content, an episodic layer that is the operational working set, and a procedural-skills layer that encapsulates distilled reasoning. Three stages evolve this graph, two online and one offline.
hybrid relevance score links
semantic and episodic units"] S1 --> CTX["Induced context for this step"] CTX --> ACT["Agent acts, environment feedback"] ACT --> S2["Stage II: Feedback-Driven Refinement
link expansion, pruning,
granularity alignment"] S2 --> GRAPH["Evolving heterogeneous graph"] GRAPH --> S3["Stage III: Long-Term Consolidation
cluster recurring successes into
reusable procedural circuits"] S3 --> GRAPH GRAPH --> OBS
Stage I forms the initial connections: at each step it scores candidate memory units by a hybrid relevance measure combining dense embedding similarity, sparse lexical matching (BM25), and structure, then induces the step's context from the best-connected units. Stage II is a closed feedback loop that refines connectivity after the agent acts: link expansion repairs under-connection, pruning removes interfering links, and granularity alignment reshapes a unit's representation when its abstraction level is wrong for the task. Stage III consolidates: it clusters recurrent successful trajectories and distills each cluster into a reusable procedural circuit, scored by a Procedural Evolutionary Maturity metric in a test-score-refine cycle that repeats until the score stops improving.
to memory links"] ATTR --> EXP["Link expansion:
repair under-connection"] ATTR --> PRUNE["Pruning:
remove interfering links"] ATTR --> ALIGN["Granularity alignment:
fix abstraction level"] EXP --> GRAPH2["Updated graph"] PRUNE --> GRAPH2 ALIGN --> GRAPH2
Results
| Memory system | Average |
|---|---|
| Zep | 61.60 |
| Mem0 | 66.30 |
| A-Mem | 71.43 |
| Nemori | 81.10 |
| Full Context (baseline) | 81.23 |
| FluxMem | 95.06 |
The three benchmarks are chosen to be fundamentally distinct: LoCoMo tests long-context conversational reasoning, Mind2Web tests web navigation across cross-task generalization, and GAIA tests general assistant tasks. FluxMem reaches state-of-the-art on all three, including beating the strong MemEvolve baseline on GAIA, which is the evidence that connectivity evolution helps across memory regimes rather than one favorable setting. Beating the full-context baseline on LoCoMo is notable: a well-connected graph outperforms simply stuffing everything into context, because reachability, not raw availability, is what helps.
What it changes
FluxMem reframes the memory question from "what do we store" to "how is it connected, and does that connectivity keep adapting." Where ReasoningBank changes the unit of memory (strategies instead of traces) and ALMA searches over memory designs, FluxMem keeps the memory but makes its topology a living thing that repairs and consolidates itself under feedback. The procedural-circuit idea is the bridge to self-improvement: recurring successes are not just stored, they are distilled into reusable subroutines the agent can invoke, which is memory beginning to behave like skill acquisition.
Where it sits among prior work
| System | Structure | Topology evolves? |
|---|---|---|
| Mem0 / Zep | Store + retrieval pipeline | No |
| A-Mem | Linked notes | Partly |
| ReasoningBank | Distilled strategy items | No (content-level) |
| FluxMem | Heterogeneous graph, 3 layers | Yes, online + offline |
Limitations
The reported results use GPT-4.1-mini, so behavior with other backbones is less certain. The evaluation is scored partly by an LLM-as-judge (on LoCoMo and GAIA), which introduces judge-dependent noise. Maintaining and evolving a heterogeneous graph online adds machinery and per-step cost beyond a flat store, and the paper does not foreground how that cost scales with very long task streams. As with the other memory papers, the benchmarks are the evaluation target, so the connectivity gains are measured in-distribution rather than against a held-out novel domain.
Learnings
- Reachability beats availability. Beating a full-context baseline shows the bottleneck is whether the useful memory is connected to the current step, not whether it exists somewhere. Evolving topology targets the right thing.
- Under- and over-connection are both failures. Memory needs to add missing links and prune interfering ones; treating retrieval as fixed gets both wrong. The feedback loop is what keeps the balance.
- Consolidation should produce reusable circuits. Distilling recurring successes into procedural skills turns memory into something closer to learned skill, a natural bridge from remembering to improving.
- Feedback attribution is the learning signal. Deciding which links helped or hurt is how the graph improves, and it is the same verifier-quality dependency the rest of this study keeps surfacing.
Strengths
- State-of-the-art on three deliberately distinct benchmarks, including beating MemEvolve on GAIA.
- Beats a full-context baseline, isolating connectivity as the lever.
- Three-layer graph cleanly separates semantic, episodic, and procedural memory.
- Online refinement plus offline consolidation gives both responsiveness and durable structure.
Open questions
- Results reported on a single backbone (GPT-4.1-mini).
- LLM-as-judge scoring on two of three benchmarks adds noise.
- Graph maintenance adds per-step cost; scaling to very long streams underexplored.
- Gains measured in-distribution; no held-out novel domain.
Glossary
| Term | Meaning |
|---|---|
| Connectivity | How memory units link to each other, which decides what is reachable at a step |
| Heterogeneous graph | A graph with multiple node and edge types (semantic, episodic, procedural) |
| Under / over-connection | Missing useful links, or retrieving too many loosely related ones |
| Granularity alignment | Reshaping a memory unit to the right abstraction level for the task |
| Procedural circuit | A reusable subroutine distilled from recurring successful trajectories |
| PEMS | Procedural Evolutionary Maturity Score, guiding skill consolidation |
Source
- Fang, Xu, Wang et al., Rethinking Memory as Continuously Evolving Connectivity (FluxMem), Zhejiang University / Alibaba (2026) · arxiv.org/abs/2605.28773
- Local copy ·
papers/rethinking Memory as Continuously Evolving Connectivity.pdf