📝 Knowledge or Reasoning? What This Research Reveals About Training AI

🎯 The Core Question

What's the problem?

We've been evaluating AI models the wrong way. We only look at whether they get the final answer right, but we ignore how they think.

More importantly: Do AI models succeed because they know facts, or because they reason well? And does the answer change depending on whether you're doing math or medicine?

💡 The Big Insight

Researchers discovered something fascinating: Every step in AI reasoning has TWO separate components:

Knowledge - Is the fact it's using correct?
Reasoning - Does this step actually move us closer to the answer?

Here's the kicker: A reasoning step can use wrong knowledge but still be logically useful. Or it can state correct facts but be totally redundant.

Think of it like GPS navigation:

Bad knowledge, good reasoning: "Turn left at the blue building" (building is actually red, but the turn is correct)
Good knowledge, bad reasoning: "There's a Starbucks on Main Street" (true, but irrelevant to your route)

🔬 How They Tested This

The researchers trained AI models using two common methods:

SFT (Supervised Fine-Tuning): Teaching by example - "Here's the question, here's the answer, memorize the pattern"
RL (Reinforcement Learning): Learning by trial - "Try different approaches, get rewarded for correct answers"

They tested these models on:

Medical questions (knowledge-heavy)
Math problems (reasoning-heavy)

🚀 Three Game-Changing Discoveries

Discovery #1: Math Skills Don't Transfer to Medicine

Models trained on mathematical reasoning performed worse in medicine than general-purpose models.

Why? Math is about logic. Medicine requires knowing thousands of specific facts. Being good at one doesn't make you good at the other.

Real-world parallel: A brilliant chess player isn't automatically a great lawyer.

Discovery #2: SFT Teaches Facts But Creates Verbose Thinkers

When models learned through examples (SFT):

✅ Knowledge accuracy improved by +6.2%
❌ Reasoning efficiency dropped by -38.9%

Translation: The AI learned more medical facts but became wordy and redundant - like a student who memorized the textbook but rambles during the exam.

Discovery #3: RL Prunes Bad Knowledge and Sharpens Reasoning

When models learned through rewards (RL):

✅ Knowledge accuracy improved by +12.4%
✅ Reasoning efficiency stayed strong or improved

How? RL reinforces paths that lead to correct answers and suppresses incorrect knowledge branches. It's like evolution - bad reasoning dies off, good reasoning survives.

📊 The Formula (Simplified)

AI Performance = Knowledge Correctness + Reasoning Efficiency

Where:
- Knowledge Index (KI) = % of facts that are accurate
- Information Gain = How much each step reduces uncertainty

Domain Differences:

Medicine: Knowledge matters more (you need to know the facts)
Math: Reasoning matters more (you need to think logically)

🎯 The Bottom Line

For Technical Readers:

This research proves that SFT excels at injecting domain knowledge but compromises reasoning conciseness, while RL simultaneously improves knowledge accuracy and reasoning quality by reinforcing correct knowledge pathways - and different domains fundamentally differ in their reliance on knowledge versus reasoning.

For Everyone Else:

Teaching AI is like teaching kids:

Memorization (SFT) helps them learn facts but makes them wordy
Practice with feedback (RL) makes them both accurate and concise
Math needs logic, medicine needs facts - one size doesn't fit all

💼 Why This Matters for Business

If you're building AI solutions (or working with consultants like aiasme.com):

1. Choose Your Training Strategy Based on Your Domain

Your Need	Best Approach
Legal/Medical AI (fact-heavy)	Start with SFT for knowledge
Strategic/Analytical AI (logic-heavy)	Prioritize RL for reasoning
Complex domains	Hybrid: SFT first, then RL

2. Don't Assume Transfer Learning Works

A model that's brilliant at financial analysis won't automatically excel at HR decisions. Domain knowledge matters.

3. Evaluate Beyond Accuracy

Ask your AI vendor:

How accurate is the knowledge it uses?
How efficient is its reasoning process?
Can it explain its thinking without rambling?

🤔 Questions to Consider

For AI developers:

Are you measuring knowledge and reasoning separately in your evaluations?
Could your training data be creating verbose, inefficient reasoners?

For business leaders:

Does your AI solution match your domain's needs (knowledge vs. reasoning)?
Are you testing the thinking process or just the final output?

For consultants:

When building AI personas, how do you balance knowledge injection vs. reasoning style?
Have you observed SFT vs. RL trade-offs in your client work?

Research Source: "KNOWLEDGE or REASONING? A Close Look at How LLMs Think Across Domains" - Wu et al., 2025