Claude Hit the Limit Again?

I was in the middle of something important.

Not "important" like saving the world. Important like — I was finally in a flow. Claude was helping me think, write, plan. And then it just... stopped.

"You've reached your usage limit."

I stared at the screen. I had two options. Pay extra. Or figure out what the hell was actually happening.

I chose the second one. Not because I'm cheap — okay, maybe a little — but because it felt wrong to pay more without understanding why I was running out in the first place.

So I did what I do. I went looking.

I read Anthropic's official help docs. I watched reels. I commented on threads and actually got some really useful answers back. I posted my own question in my community. I went down a GitHub rabbit hole at 11pm when I should have been sleeping.

And what I found genuinely surprised me.

It wasn't my prompts. It wasn't that I was asking too much. The limit was running out because of things I didn't even know were happening in the background — silently, every single message, before I even typed a word.

This blog is everything I found. All in one place. Because I don't want you to have to do what I did.

🤯 What's Actually Happening Behind the Scenes?

Most people think they hit the limit because they asked too many questions or wrote too much. That's not really it.

Here's the thing nobody tells you: every single message you send, Claude silently reloads your full conversation history, your memory profile, your connected tools, and your style settings in the background. Thousands of tokens — gone before you even typed your question.

A creator named _prem.io posted a viral thread breaking down exactly what was eating his tokens in a 20-message chat:

What's loading	Tokens used
Memory profile	~60,000
Tool schemas (15+ MCP tools)	~40,000
Web search schema	~10,000
Conversation history	~80,000
Claude's verbose responses	~8,000
Total	~198,000 tokens

Every. Single. Chat.

After fixing his settings, he went from ~198,000 tokens to ~10,000 tokens per chat. That's a 94% reduction — same output, same quality.

💡 The limit isn't running out because you're doing too much. It's running out because the settings were silently working against you.

📖 Official Sources I Read (So You Can Too)

📄 Anthropic Help: How do usage and length limits work? 👉 https://support.claude.com/en/articles/11647753-how-do-usage-and-length-limits-work
📄 Anthropic Help: Usage limit best practices 👉 https://support.claude.ai (search "Usage limit best practices")
📄 Anthropic Help: What is Claude's memory? 👉 https://support.claude.ai (search "Claude memory")
📄 Anthropic Help: Searching past chats with Claude 👉 https://support.claude.ai (search "searching past chats")
🐙 GitHub: Caveman Claude Skill (cuts output tokens ~75%) 👉 https://github.com/amanattar/caveman-claude-skill
🐙 GitHub: Caveman CLI Tool (#1 trending on GitHub, 5K+ stars) 👉 https://github.com/JuliusBrussee/caveman
🐙 GitHub: Code Review Graph (for developers, 12K+ stars) 👉 https://github.com/tirth8205/code-review-graph

⚙️ Part 1: Settings to Change Right Now (Do This Once)

1. Turn Memory OFF

Settings → Memory → toggle off

Every message loads your full memory profile silently. Turning this off saves ~60,000 tokens per 20-message chat instantly.

Still need context? Use Notion. Create 3 lean pages (your identity, platform rules, active project). Start each chat with: "Load my profile from Notion." Claude fetches it once — not every message.

💰 Saves ~60,000 tokens per 20-message chat

2. Switch Style to "Concise"

Tap + icon before sending → Use style → Concise

Claude's default style is verbose. Switching to Concise cuts output tokens by roughly half — automatically, no prompting needed.

💰 Cuts output tokens by ~50%

3. Use Sonnet, Not Opus, for Everyday Work

Opus is powerful but expensive on your token budget. For captions, content drafts, brainstorming — Sonnet does the same job. Save Opus for when you genuinely need deep reasoning.

💰 Sonnet is up to 5× more token-efficient than Opus

4. Turn Web Search OFF When You Don't Need Live Data

Tap + icon → toggle web search off

Web search is ON by default. For writing, planning, or creative tasks — it's adding overhead to every single message for no reason.

5. Set Connected Tools to "On Demand"

Each connected tool (Notion, Google Drive, Slack) loads its full schema into every message — even when you're not using it. That's 2,000–5,000 extra tokens per message.

Fix: Go to tool settings → change from "Always loaded" to "On demand"

💬 Part 2: Conversation Habits to Change

6. Load Your Context in the First Message

Anthropic officially recommends this. Before you start — plan what you need, what background you're providing, what you're asking. Put it all in the first message. Reduces back-and-forth significantly.

7. Edit Your Prompt — Don't Send a New One

Made a mistake? Click the edit icon on your last message instead of sending a follow-up. A new message = Claude reloads the full conversation. Editing = much cheaper. Almost nobody knows this one.

8. Batch Your Questions — One Message, Not Three

Instead of three separate messages, write it all in one:

"Summarise this article, list the key points, and suggest a headline."

Anthropic confirms this in their usage guide. Every separate message reloads your full context.

9. Start a Fresh Chat After ~10 Messages

Long conversations drag full history into every message. After ~10 exchanges, open a new chat. Use Projects to carry forward only what you actually need.

10. Put Your Rules in a Project — Once

Anthropic confirmed: content in Projects is cached and doesn't count against limits the same way when reused.

Add your instructions, constraints, and background context once. Claude follows them every chat without you repeating yourself.

💰 Cached project content = reused without full token cost

11. Don't Upload Heavy Files — Copy-Paste Instead

A single PDF page costs 1,500–3,000 tokens per Anthropic's own docs. A full screenshot costs 1,300+ tokens. Extract just the text you need and paste it plain. Crop screenshots tight.

12. Use Claude During Off-Peak Hours

Anthropic officially confirmed: during weekday peak hours (5am–11am Pacific / ~6:30pm–12:30am IST), your session limit burns faster. Evenings and weekends give you significantly more room. Schedule heavy Claude work accordingly.

🐙 Part 3: GitHub Tools Worth Knowing

🪨 Caveman Mode — Make Claude Talk Less

👉 https://github.com/JuliusBrussee/caveman (5K+ stars, was #1 trending on GitHub) 👉 https://github.com/amanattar/caveman-claude-skill

Make Claude respond in compressed, caveman-style language. No filler. No pleasantries. Just the answer. Technical substance stays completely intact.

Normal Claude:

"The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle..."

Caveman Claude:

"New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo."

Same information. 75% fewer tokens. There are intensity levels — lite, full, ultra, and even classical Chinese compression modes if you're feeling adventurous 😂

The SKILL.md file can be added to any Claude Project as an instruction — so this isn't just for developers.

📊 Code Review Graph — For Developers

👉 https://github.com/tirth8205/code-review-graph (12,000+ stars)

Claude re-reads your entire codebase on every task. This tool builds a persistent map so Claude only reads what's relevant. Results: up to 8x fewer tokens on code reviews, up to 49x on daily coding tasks. Not for beginners — but worth knowing it exists.

🧠 My Honest Take

Most people blame their prompts when they hit the limit.

The real culprit is the invisible overhead — memory loading, tool schemas, web search running silently in the background, verbose responses adding up.

Fix the settings first. Change your conversation habits second. Prompts are the last thing you need to worry about.

I went from hitting limits mid-week to finishing the week with usage left — just by changing settings. No better prompts. No upgrade. Just the right switches flipped.

✅ Quick Checklist — Save This

Settings (do once):

Memory OFF → Settings → Memory
Style → Concise (+ icon)
Sonnet for everyday tasks
Web search OFF when not needed
Tools → "On Demand"

Habits:

Heavy context in your first message
Edit prompts, don't send new ones
Batch questions in one message
New chat after ~10 messages
Rules inside a Project — once

For developers:

Caveman Mode → https://github.com/amanattar/caveman-claude-skill
Code Review Graph → https://github.com/tirth8205/code-review-graph

📎 All Links

Resource	Link
Anthropic: Usage & Length Limits	https://support.claude.com/en/articles/11647753-how-do-usage-and-length-limits-work
Anthropic: Usage Best Practices	https://support.claude.ai → search "Usage limit best practices"
Anthropic: Claude Memory	https://support.claude.ai → search "Claude memory"
Anthropic: Searching Past Chats	https://support.claude.ai → search "searching past chats"
Caveman Claude Skill	https://github.com/amanattar/caveman-claude-skill
Caveman CLI Tool	https://github.com/JuliusBrussee/caveman
Code Review Graph	https://github.com/tirth8205/code-review-graph

Written by Ranjani Shetty — AI Educator helping content creators, students, and brand owners learn AI practically. Follow me on Instagram: ranjani shetty

If this helped you, share it with one person who's also hitting the Claude limit. They'll thank you. 🙏

Claude Hit the Limit Again?

🤯 What's Actually Happening Behind the Scenes?

📖 Official Sources I Read (So You Can Too)