Built on the Discogs public catalog - monthly XML dumps, CC0 licensed, full catalog depth. Every artist, label, release, pressing, format, credit, identifier, and the relationships between all of them. Normalised, indexed, served through a REST API and a fully open MCP server.
No keys. No signup. Any agent, any workflow. Point at it.
The MCP tools are deterministic. Every response includes provenance - why a record matched, where the data came from, where confidence drops. Agents get structured facts, not vibes.
The same retrieval services power the human-facing web app, so what users see and what agents get is always the same answer.
Dig doesn't generate, recommend, or invent. It retrieves. The AI sits on top. Dig is what the AI uses when it can't afford to guess.
The open MCP layer is a deliberate choice. Dig doesn't gate access because the goal isn't to monetise queries - it's to become the default tool agents reach for.
Deliberately simple stack for a small team. Modular monolith. Postgres as canonical source of truth. Full text search with pg_trgm to start - no search cluster until the metrics justify one. Redis for jobs and caching. One codebase powering the web app, the API, and the MCP server from the same domain services.
The MCP layer serves structured database queries, not LLM inference. No expensive compute behind every agent call.
The data is open. The interface is open. The cost is manageable.
The relationships to bring it to the right people are already there.
Architecture, ingestion pipeline, and API/MCP interface details are below. This is the technical layer the product and agent workflows both sit on top of.
LLMs orchestrate. Dig retrieves. Every tool output is structured, reproducible, and explainable. MCP and REST call the same retrieval services. No hidden writes - agent tools read, rank, and explain by default.
/search (catalog entities + filters)/artists/:id/labels/:id/masters/:id/releases/:id/releases/:id/media-links/curators/:slug, /lists/:slugsearch_catalogget_release, get_master_releaseget_artist, get_labelget_related_releasesget_media_linkscreate_crate_draftexplain_relationshipsArchitecture — Modular monolith Search v1 — Postgres FTS + pg_trgm Ingest — Raw payload staging in ingest.raw_entities Auth v1 — Editorial-only LLM strategy — No proxying. Expose MCP/API, users bring models Non-goals — Marketplace, compliance stack, Discogs write-back