Paper ingestion
Structured parsing of PDFs, appendices, and repos to capture experimental intent alongside executable code.
Autonomous Reproducibility
VerityLab AI builds agents that replicate and verify machine learning research, starting with LLM quantization. We read each paper, execute the code, benchmark the results, and publish what truly holds up—turning peer review into measurable verification.
Why it matters
Autonomous verification closes the gap between ambitious research claims and repeatable results. By validating every step—from data to metrics—we make AI progress auditable.
Our mission
We believe the next generation of AI breakthroughs must be built on verifiable foundations. Our system ingests research artifacts, reconstructs experiments, and tracks reproducibility metrics so the community can rely on trustworthy benchmarks.
Structured parsing of PDFs, appendices, and repos to capture experimental intent alongside executable code.
Containerized runs mirror author environments to reproduce results without manual babysitting.
Detailed diffs, alerts, and confidence scores help teams trust what they ship and cite.
Demo walkthrough
We run our autonomous verifier against OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models to ensure the reported W4A4 LLaMA-7B metrics hold up. The agent ingests the paper, clones the repo, then executes the quantization scripts end-to-end before publishing a reproducibility verdict.
01 · Ingest & plan
Parse PDF + README, capture training recipe, dependency matrix, expected checkpoints, and evaluation harness.
Artifacts normalized in less than 2 minutes.02 · Environment build
Provision CUDA 12.1 container, install OmniQuant requirements, pull LLaMA weights, and seed datasets for the C4 validation split.
Hash-locked images guarantee replayability.03 · Execution trace
Execute scripts/run_llama.sh --bits 4 --act-bits 4,
capture logits, and log intermediate perplexity curves.
04 · Verification
Reported ppl 2.58 vs reproduced 2.63 (Δ +1.9%). Confidence band within tolerance, so the claim is verified.
Delta + explanation logged to evidence vault.OmniQuant verification verdict
Reported ppl
2.58
Measured ppl
2.63
Δ %
+1.9%
Evidence bundle includes container hash, logs, and reproducibility script for downstream audits.
How it works
The agent ingests each paper, repository, and dataset to create a reproducibility blueprint.
We execute the code, enforce dependencies, and compare metrics against the author’s claims.
Results are packaged into a shareable report with pass/fail status, deltas, and insights.
Early partners
Whether you ship models or evaluate them, VerityLab AI is your co-pilot for reproducibility.
Stay in the loop
Receive launch updates, verification reports, and early access to the autonomous agent. We send meaningful updates—nothing else.