Know what's worth reading.

Nabu shows you which papers hold up to rigorous scrutiny. A structured, blind review of any paper's strengths and weaknesses, with the journal and citation count left out of it.

Evaluate a paper See real evaluations

A complete evaluation

Every evaluation reports three things.

Study Quality, Trust Signals, and real-world Impact Potential — scored separately, each traceable to the text.

Highly accurate protein structure prediction with AlphaFold

John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, et al.202110.1038/s41586-021-03819-2

In Brief

AlphaFold delivers a benchmark-validated leap to atomic-scale protein prediction, with shallow alignments as its main constraint.

Study Quality

Exemplary4.7/5.0High

Trust Signals

No concerns

4 of 4 checks complete

Impact Potential

Very High4.7/5.0High

Key Claims

Primary findingWell-supported

first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known

+4 more

Strengths

#methodological rigour

Blind benchmark validates central claim

CASP14 is described as the gold-standard assessment for structure prediction and was unavailable to developers during model training. AlphaFold’s median backbone accuracy of 0.96 Å r.m.s.d.95, compared with 2.8 Å for the next-best method, directly supports the headline performance claim.

↳ Results p. 584; Fig. 1a

+2 more

Limitations

#impact potential

Accuracy depends on alignment depth

The paper reports that accuracy decreases substantially when median alignment depth falls below around 30 sequences. This limits transferability for targets with sparse evolutionary information.

↳ MSA depth and cross-chain contacts; Fig. 5a

+2 more

The signs were in the paper

Prestige said one thing, the paper said another.

The warning signs are in the paper itself - methods that don’t match the claims, conclusions the design can’t support. Nabu reads for exactly those, blind. Three real cases:

Growth in a Time of Debt

Carmen Reinhart, Kenneth Rogoff · 2010

articleEconomics, Econometrics and Finance

What citations said

Foundational — 6,300+ citations, top-tier journal.

What the paper shows

A descriptive correlation read as a hard policy threshold. Study Quality 3.2 (Moderate). Later contested in replication.

Study Quality

Moderate3.2/5.0

Trust Signals

Impact Potential

High3.8/5.0

Click to see more

DNA tensiometer reveals catch-bond detachment kinetics of kinesin-1, -2 and -3

Noell et al. · 2024

preprintBiochemistry, Genetics and Molecular Biology

What citations said

Skip it — a preprint with 5 citations.

What the paper shows

Rigorous work that doesn’t wait for a journal to be good. Study Quality 4.2 (Strong).

Study Quality

Strong4.2/5.0

Trust Signals

Impact Potential

Medium3.3/5.0

Click to see more

Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial

Gautret et al. · 2020

articleMedicine

What citations said

Trust it — 4,900+ citations, top-quartile journal.

What the paper shows

Methodological failures legible from day one. Study Quality 1.8 (Very Limited). Retracted.

Study Quality

Very Limited1.8/5.0

Trust Signals

Impact Potential

Low2.5/5.0

Click to see more

The evidence

We don’t trust AI either. So we tested it.

6.1

Benchmarked to peer review

vs the 5.0 best-human-reviewer baseline (H-Max) defined in Google’s ScholarPeer study. Graded blind on sample papers and their open reviews.

ScholarPeer — Goyal et al. (2026) →

0.81

Inter-rater reliability

More than double the human peer-review benchmark of 0.34. Nabu’s reviewers independently reach an ICC₂ of 0.81 (absolute agreement) across all ten scoring dimensions.

Bornmann, Mutz & Daniel (2010) →

85%+

Potential retractions flagged

Concerns flagged blind by weak/poor craft score, or material reliability flag. When a Nabu assessment is appended to a paper’s metadata, these weak articles don’t surface in downstream AI answers.

Gu et al. →

See our methodology and validation

Why it works

We don’t ask AI for its opinion.

We force it through a rubric calibrated to the paper’s field and methodology — explicit criteria, scored component by component.

The rubric is the evaluator. The AI is the instrument.

Traceable to the text.

Every score points to the line in the paper that earned it. When your expertise says otherwise, you can see exactly what Nabu saw and overrule it in seconds.

Not one model’s opinion.

Several independent reviewer models score every paper blind to author, journal, and institution, and an adjudicator resolves disagreement on the strength of evidence.

Built and stress-tested by experts.

The rubric was calibrated with practising methodologists across fields. Its performance is monitored continuously, and an evaluation escalates to a human reviewer when a paper falls outside what the AI can reliably assess.

No incentive to inflate.

Nabu has no relationship with any publisher, journal, or institution it evaluates. There’s no version of a score that pays us more. The independence is structural, not a promise.

A QUALITY-ASSURANCE LAYER FOR YOUR ORGANIZATION

Stand behind every funding call, R&D decision, and strategy built on published research.

The literature you’re deciding on is degrading. Retractions hit record highs, paper-mill output is doubling faster than legitimate research, and a growing share of papers cite studies that were never published. The AI tools your teams increasingly rely on can’t tell the difference. The checks that should catch this are inconsistent, and invisible from where you sit.

The warning signs are almost always in the paper itself: methods that don’t match the claims, conclusions the design can’t support, references to evidence that doesn’t exist. Nabu reads every paper blind for exactly those signals, so the calls you fund, reward, and build on are ones you can defend.

Talk to us about Nabu for your institution

10,000+ retracted in 2023Nature ↗12× rise in fabricated citations in two yearsTopaz et al., The Lancet 2026 ↗1.5 yr vs 15 yr paper-mill vs legitimate doubling timeRichardson et al., PNAS 2025 ↗

The careful second read, on the paper in front of you.

Currently collecting feedback in Research Preview