Know what's worth reading.

Evidence-based quality evaluation for any research paper.

Nabu doesn't summarize or replace your judgement. It surfaces evidence to tell whether the work is sound; so your reading time goes to the papers that earn it.

WHAT NABU TELLS YOU

For any paper, Nabu tells you whether it -

  • has a methodology that matches its aims
  • is internally consistent, end to end
  • has been retracted, corrected, or flagged
  • holds up to post-publication scrutiny
  • addresses a problem that matters
  • shows a credible path to implementation

WHEN THE PRESTIGE SIGNAL GETS IT WRONG

Three cases where Nabu changed the answer.

Top-quartile journal · 1,200+ citations

0.0
/5.0
Weak
Craft concerns

Conclusions extend beyond what the experimental design can support. Sample selection introduces a confounder the discussion doesn’t address.

Mid-tier journal · 18 citations

0.0
/5.0
Exceptional
No concerns

Methodologically sound replication of an earlier high-impact result, with a meaningful extension to a new population. Cited downstream by 6 papers in the past year.

High-impact journal · later retracted

0.0
/5.0
Poor
Red flag

Flagged Red on initial evaluation - methodology issues consistent with later-published correction. Retracted 14 months after publication.

VALIDATED, NOT ASSERTED

Four ways we’ve tested whether the signal holds.

0.0 / 5.0

Critique quality vs. Best human expert reviews

Nabu’s review critiques scored on H-Max, a metric that calibrates critique quality against the full set of human expert reviews (best human review = 5.0), and benchmarked against ScholarPeer, a leading multi-agent peer-review framework.

0%

Retracted papers caught blind, before retraction

A curated corpus of confirmed-retracted papers vs a control set of non-retracted papers from the same sources, evaluated blind with no knowledge of retraction status. The rubric placed the large majority in the bottom two quality tiers and flagged them Red - identifying what post-publication scrutiny later confirmed.

Surgisphere / Lancet HCQ paper - scored 1.4 / 5.0, Red flag, rejected by all three reviewers blind.

0%

Decision divergence from the prestige default

In a replication of the Gu et al. decision-divergence study, evaluators shown Nabu’s assessment changed their quality judgment in 70% of cases - moving the decision away from the journal-prestige default.

0.00

More than double the human peer-review benchmark

Nabu’s primary reviewers reach an ICC₂ of 0.81 (absolute agreement) on the composite Quality score - against a published meta-analytic benchmark for human peer review of 0.34. With adjudication, reliability rises to 0.89.

HOW IT WORKS

01

Blind.

No author, affiliation, journal, retraction status or citation count is visible during evaluation. Reviewers see only the work and its publication year.

02

Multi-model.

Multiple frontier models review each paper independently against the same rubric - not one model’s opinion.

03

Adjudicated.

Where the models diverge, an editorial layer resolves the score on strength of evidence, with documented rationale you can read.

See the full methodology →

SEE IT IN ACTION

A QUALITY-ASSURANCE LAYER FOR YOUR ORGANIZATION

Confidently stand behind every R&D decision, funding call, and strategy memo built on published research.

That input stream is degrading: retractions hit record highs, paper-mill output is growing faster than legitimate research, and AI-generated submissions are seeding the literature with citations that point to non-existent studies. The checks that should catch this are inconsistent, and invisible from where you sit.

But the warning signs are usually in the paper itself: a methodology that doesn’t match its claims, conclusions the design can’t support, references to evidence that doesn’t exist. Nabu reads every paper, blind, for exactly those signals - so the calls you make on which science to fund, reward, and build on, are ones you can defend and de-risk.

Evaluate a paper you're citing.

Free during research preview.