Execution-verified coding datasets, benchmark-targeted generation, and a before/after performance guarantee. Built for the teams fine-tuning LLMs — not the ones making slide decks about them.
$ da1a jobs create ./job.yaml▸ job_id job_01HY5PX3W2N4C7JQK8VAHF9D2E▸ domain coding▸ format instruction-response▸ languages [python, typescript, rust]▸ volume 2,500 examples[queued] accepted · 2,500 pending[generating] 2,500 / 2,500 drafted[sandbox.execute] running in 48 workers · t+41s[filter.dedup] MinHash · removed 108[filter.toxicity] detoxify · removed 3[verify] execution-verified · 2,254 passed→ 200 OK · job.complete# response{"job_id": "job_01HY5PX3W2N4C7JQK8VAHF9D2E","status": "complete","verified": true,"pass_rate": 0.942,"total": 2500,"passed": 2254,"filtered": 111,"dedup_removed": 108,"output": {"format": "jsonl","size": "11.8 MB","url": "r2://da1a/datasets/job_01H…/data.jsonl"},"manifest": {"version_hash": "sha256:9f2c…a81d","dp_epsilon": 1.2,"signed": true}}$
* rolling 30-day figures from production generation pipeline.
We don't sell you tokens and disappear. Every dataset is generated, filtered, and verified against the same pipeline we use internally — with the receipts to prove it.
Submit a job spec via the dashboard or REST API. Pick a domain, output format, languages, difficulty mix, and volume. Optionally provide seed examples — we strip PII on ingest with Presidio.
POST /v1/jobs
{
"domain": "coding",
"format": "instruction-response",
"languages": ["python", "ts", "rust"],
"difficulty": { "beginner": 0.2,
"intermediate": 0.5,
"advanced": 0.3 },
"volume": 2500
}Every code example is executed in a sandboxed, network-isolated container with a kill switch. No syntactic judging, no LLM-as-critic. If it doesn't run, it doesn't ship.
[sandbox.execute] workers 48 timeout 30s · SIGKILL on overrun network deny-all fs read-only, tmpfs scratch verdict 2254 / 2500 passed (94.2%) retries disabled — failures are signal
Receive a curated dataset with a full quality report: pass rate, language & difficulty breakdown, dedup stats, and a signed manifest so you can prove to auditors what your model was trained on.
# quality_report.json
{
"pass_rate": 0.942,
"dedup": 108,
"pii_stripped": 0,
"manifest": "sha256:9f2c…a81d",
"dp_epsilon": 1.2,
"signed": true
}Public datasets are stale. Generic synthetic data is incoherent. We built da1a for teams that care about what their model does at eval time, not what was shipped to HuggingFace six months ago.
Every code example runs in a sandboxed Docker container with a hard timeout. Pass/fail is a fact, not an LLM opinion.
A graph-first generator keeps fields consistent. An 'easy' Python problem won't get an expert-level solution. No data-model drift.
Pick HumanEval, MBPP, or a BIG-Bench subset. We run gap analysis and generate data aimed at closing your specific failure modes.
Every job ships with a differential-privacy epsilon value and a signed, immutable manifest. EU AI Act ready out of the box.
“Our internal coding eval jumped 6.1 points after one week of da1a data. Our previous synthetic vendor shipped us 200K duplicate examples.”
“The signed manifest alone paid for the subscription. Our compliance team stopped asking us where our training data came from.”
“We replaced four internal data pipelines with one da1a job spec. Execution verification caught failure modes our LLM judge was greenlighting.”
All plans include execution verification, signed lineage, and PII-stripping on seed uploads. No hidden overage charges — you'll hit a soft cap with 72h advance notice.
For solo engineers and small teams shipping their first fine-tune.
For ML teams fine-tuning weekly and running against benchmarks.
For teams treating data quality as a first-class engineering lever.
Request early access — we'll get you onboarded this week. Tell us what you're fine-tuning and we'll help you scope a first job.