Why We Built It This Way
When we started WeDaita, the assumption was: take LLMs + drug discovery databases = acceleration. We shipped a first version that worked exactly like that. Scientists hated it.
Not because it was slow. Because it hallucinated database entries. Confidently. Citing papers that didn't exist, returning ChEMBL IDs that were subtly wrong. Trust, once broken, is hard to recover in a scientific context.
The Core Problem With "LLM + Database"
A language model's training data is frozen. Drug discovery databases (ChEMBL, UniProt, NCBI) update continuously. If you let the model answer from memory, it will answer from stale snapshots with no uncertainty signal.
Our fix:
the agents never answer from model weights for factual database queries. Every database lookup is a real API call or SQL query. The model's job is to decompose the scientific question into sub-queries, interpret results, and synthesise—not to recite facts.
What "Agentic" Actually Means in Our Stack
We use three tiers:
- Planner — breaks a user query into a sequence of tool calls.
- Tools — deterministic integrations with ChEMBL, DrugBank, UniProt, PDB, OpenTargets, etc.
- Synthesiser — takes structured tool outputs and writes a ranked, cited summary.
The tools are the hard part. Not because the APIs are complex—because scientific databases are inconsistently versioned, have spotty rate limiting, and occasionally serve wrong data on their own end. We maintain a caching + validation layer that catches a surprising number of upstream errors.
Three Things We Got Wrong Early
1. We over-indexed on UI/UX before the data pipelines were stable
A beautiful interface on top of unreliable data is worse than an ugly interface on top of reliable data. Scientists notice the data errors first. We rebuilt the pipelines before touching the UI again.
2. We underestimated domain-specific validation
Generic LLM benchmarks are useless for scientific AI. We now maintain a set of "known-answer" queries — e.g. "What are approved EGFR inhibitors?" — that we run against every agent update. If the answer changes unexpectedly, we don't ship.
3. We shipped ADDA too early
Antibody design requires a tighter feedback loop between prediction and wet-lab result than target ID. We launched ADDA before we had that loop instrumented, so we couldn't learn from failures systematically. It's better now — but it cost us two months.
What's Working
The audit trail feature has become our most mentioned differentiator. Every agent run produces a full log: which databases were queried, which versions, at what time, with what parameters. Scientists can reproduce a result from 6 months ago exactly. That's table stakes for any serious drug discovery workflow, but almost nobody else does it.
What's Next
Building more robust auditing and validation process including human-in-the-loop process is an important next step.
If you're building in this space and want to compare notes, reach out.
Comments (0)