Wiki Log

Chronological record of all wiki actions. Append-only. Format: ## [YYYY-MM-DD] action | subject

[2026-05-14] create | Wiki initialized

Domain: LLM Wiki / Agent Memory / Context Compression / Knowledge Integration.
Created SCHEMA.md, index.md, log.md, raw/, entities/, concepts/, comparisons/, queries/.

[2026-05-14] ingest | Initial LLM Wiki / Agent Memory source batch

Created 20 raw source files under raw/articles, raw/papers, raw/community, raw/product-docs.
Created concepts/llm-wiki-agent-memory-research-framework.md.
Sources include Karpathy gist, MemGPT, Generative Agents, CoALA, RAG survey, Anthropic agent posts, LangChain context engineering, Simon Willison embeddings, Microsoft hybrid search, HN discussions, OpenAI/Letta memory docs, RAPTOR, Self-RAG, MemoRAG.
Reddit extraction blocked; marked low reliability / insufficient evidence.

[2026-05-14] correction | Raw source preservation policy

User corrected ingestion policy: raw sources should preserve parsed original/source text, not just source_url and key extracted claims.
Updated SCHEMA.md to require parsed original/source text whenever accessible, with explicit raw_preservation / extraction_status marking when blocked, truncated, or summarized.
Rewrote existing raw source files to include ## Parsed Source Text sections from fetched/extracted markdown where accessible.
Important caveat: web_extract often returns capped/summarized markdown for long sources; these files are now marked tool_parsed_or_summarized_text where appropriate and should be upgraded with full PDF/API/browser extraction in later passes.

[2026-05-14] plan | Hermes + Obsidian personal knowledge base

Continued research on Obsidian integration options: Obsidian URI, Web Clipper, Properties/Bases, Dataview, Local REST API, MCP-style integrations, and Hermes memory docs.
Added 9 raw source files for Obsidian/Hermes integration evidence under raw/product-docs and raw/github.
Created concepts/hermes-obsidian-personal-knowledge-base-plan.md.
Plan recommendation: keep markdown+git as canonical memory, use Obsidian as the human review/editing workspace, and use Hermes as ingestion/retrieval/synthesis/lint automation. Keep Hermes built-in memory bounded to compact steering facts; store larger personal/research/project knowledge in Obsidian.

[2026-05-16] refine | Hermes + Obsidian KB truth boundaries

Expanded concepts/hermes-obsidian-personal-knowledge-base-plan.md with a single-source-of-truth model: immutable raw sources as evidence, mutable wiki notes as synthesis, append-only logs, bounded Hermes memory as steering cache, and derived indexes as rebuildable non-canonical artifacts.
Strengthened raw source rules: raw resources must be original text/artifact records or exact local storage paths that an agent can locate and read; summaries cannot replace raw evidence.
Added strict global and per-note schemas for raw_source, concept, decision, session_summary, and procedure notes.
Added source_derivation for transformed sources such as OCR outputs, transcript cleanups, parsed PDFs, translated versions, normalized HTML extracts, and markdown cleanup passes.
Added automation/permission boundaries, folder policies, human-confirmation triggers, and MVP ingest/truth-lookup/session-to-knowledge loops.

[2026-05-14] upgrade | Raw source full-text pass

Upgraded 7 arXiv paper raw sources to raw_preservation: full_pdf_text using arXiv PDFs + PyMuPDF page text extraction: MemGPT, Generative Agents, CoALA, RAG Survey, RAPTOR, Self-RAG, MemoRAG.
Upgraded 8 web/blog/product sources to raw_preservation: full_html_article_text_candidate using readability-lxml + html2text. These are candidate full article text because site-rendered dynamic content may still omit hidden sections.
Upgraded 3 Hacker News discussions to full comment-tree text: HN Karpathy-style wiki via Algolia API (115 comments), HN MemGPT via Firebase API (106 comments), HN Letta Code via Firebase API (37 comments).
Remaining gap: Reddit LocalLLaMA memory thread remains extraction_blocked; full thread still needs browser/API/archive/manual export.

[2026-05-14] update | Git synchronization requirement

User requested that future changes be synchronized through the git repository.
Updated SCHEMA.md to require checking git status, staging relevant files, committing with a clear message, and pushing when a remote is configured/available.

[2026-05-15] ingest | GitHub issue/repo evidence for agent memory systems

Added raw GitHub sources under raw/github/:
- mem0 issue #4573 production memory audit: 10,134 entries, reported 97.8% junk, feedback-loop amplification, quality-gate recommendations.
- LettaBot issue #652: per-conversation context scoping for MemFS/memory blocks.
- README snapshots for mem0, Letta Code, WUPHF, llm-wiki-compiler, LangChain context_engineering, and LangChain how_to_fix_your_context.
Updated concepts/llm-wiki-agent-memory-research-framework.md with new GitHub-backed evidence on memory failure modes, context scoping, existing projects, ingestion quality gates, and Source Map rows.
Updated index.md summary.

[2026-05-16] update | Negative memory entropy filter

Added negative memory filtering rule to concepts/hermes-obsidian-personal-knowledge-base-plan.md: entropy means low future-retrieval value, not Shannon entropy.
Default policy: reject transient state unless it becomes a durable preference, fact, decision, reusable procedure, source-backed synthesis, or tracked follow-up.
Explicitly rejects shell/tool output logs, agent chain-of-thought, repeated retrieval excerpts, conversational scaffolding, completed task traces, and untracked future-maybe operational state.

[2026-05-16] update | Memory pipeline and extraction threshold

Added memory pipeline: interaction -> working context -> temporary scratch -> candidate extraction -> entropy filter -> durable knowledge -> retrieval index.
Defined knowledge as compressed state transition, not interaction history.
Updated session_summary structure to include Durable Outcomes, Decisions, New Knowledge, Reusable Procedures, Open Questions, Evidence Added, and Rejected / Do Not Store.
Added memory extraction threshold: create durable session notes only when architecture changes, durable preferences, reusable procedures, sources, decisions, long-term synthesis, or worthwhile unresolved questions emerge.
Explicitly excludes transcript, chronological replay, tool log, and chain-of-thought sections from stable session notes.

[2026-05-16] update | Session half-life lifecycle scheme

Added session half-life mechanism: sessions are candidate-knowledge temporary containers, not permanent long-term knowledge.
Added memory class model: canonical, semantic, operational, and episodic, with retrieval priority based on class rather than folder.
Added adaptive retention stages: active, decaying, archive candidate, compression, and rare deletion with human confirmation.
Added reinforcement model: retrieval/citation counts can increase retrieval weight so valuable operational memory persists while noise decays.
Added canonicalization pipeline: extract durable items from session notes, promote them to concepts/decisions/procedures/project memory/user profile, then lower session importance.
Added minimal memory metrics and recommended 50-sessions/active/ plus 50-sessions/archive/ layout.

[2026-05-16] update | Working Set Assembly standard v1

Added Working Set Assembly v1 as a deterministic pipeline from scoped retrieval results to a role-separated, token-budgeted, semantically compressed execution context.
Defined structured CandidateNote, Cluster, and WorkingSetOutput schemas to prevent free-text drift in intermediate artifacts.
Specified retrieve, rank, cluster, compress, deduplicate, isolate-by-role, and assemble steps with fixed scoring weights and token budget.
Added versioning, metrics, and --debug-working-set observability requirements.
Reinforced core principles: retrieval is exploration, cluster is the meaning unit, and working set is a runtime artifact rather than a storage structure.

Wiki Log ​

[2026-05-14] create | Wiki initialized ​

[2026-05-14] ingest | Initial LLM Wiki / Agent Memory source batch ​

[2026-05-14] correction | Raw source preservation policy ​

[2026-05-14] plan | Hermes + Obsidian personal knowledge base ​

[2026-05-16] refine | Hermes + Obsidian KB truth boundaries ​

[2026-05-14] upgrade | Raw source full-text pass ​

[2026-05-14] update | Git synchronization requirement ​

[2026-05-15] ingest | GitHub issue/repo evidence for agent memory systems ​

[2026-05-16] update | Negative memory entropy filter ​

[2026-05-16] update | Memory pipeline and extraction threshold ​

[2026-05-16] update | Session half-life lifecycle scheme ​

[2026-05-16] update | Working Set Assembly standard v1 ​

Wiki Log

[2026-05-14] create | Wiki initialized

[2026-05-14] ingest | Initial LLM Wiki / Agent Memory source batch

[2026-05-14] correction | Raw source preservation policy

[2026-05-14] plan | Hermes + Obsidian personal knowledge base

[2026-05-16] refine | Hermes + Obsidian KB truth boundaries

[2026-05-14] upgrade | Raw source full-text pass

[2026-05-14] update | Git synchronization requirement

[2026-05-15] ingest | GitHub issue/repo evidence for agent memory systems

[2026-05-16] update | Negative memory entropy filter

[2026-05-16] update | Memory pipeline and extraction threshold

[2026-05-16] update | Session half-life lifecycle scheme

[2026-05-16] update | Working Set Assembly standard v1