Read on: My website
Read time: 3 Minutes
Next-generation AI models are getting more advanced and more expensive.
If these models are not used effectively, wasteful habits will be burning a hole in your pocket. And it won’t come cheap.
AI companies are clamping down on usage limits, Peak vs Off-Peak hours, and the outcome is affecting users’ ability to utilize their systems throughout their day-to-day lives.
Ultimately, your current habits are costing you money.
And if you don’t change anything, it will cost you…a lot of money. Even stop you in your tracks on projects, proposal writing, and anything else that you rely on for AI, unless you pay for it.
This means effective token management is evolving into a critical skill that needs to be part of any AI tool that you use.
NVIDIA CEO Jensen Huang mentioned that he expects his engineers to spend at least $250k on AI tokens each year to get their job done.
Unfortunately, many of us might not have the luxury of an unlimited budget to spend on tokens or upgrading our plans from Pro to Max to an Enterprise Level, so we need to make do with what we have right now.
Was this newsletter forwarded to you?
If you are utilizing a software tool that sits on top of an LLM, this also applies to you, as you could see a dramatic increase in your price, especially if you are now uploading PDFs, large documents into your software tool, attachments, what AI models you use, etc.
This will impact how you use the tool, what your original intentions of getting the tool were, and how you will use it forward on a regular and monthly basis, based on the
So to put it bluntly, if you don’t know what you are doing, it will cost you.
So what can you do? How can you best optimize your current systems right now? Exactly what we are going to do today.
Let’s dance.
Replace your first 4 hires with AI. Free workshop on April 8th.
Most early-stage founders can’t afford their first four hires. Sales, marketing, dev, and support alone can run hundreds of thousands in salaries.
On April 8th, AI thought leader Heather Murray shows pre-seed and seed founders how to build all four functions using AI tools. Live, with demos, for free.
Register today and get a free AI tech stack worth $5K+ including Claude, AWS credits, Make, and 90% off HubSpot.
Usage
I’m taking a hot minute to focus on this, as it is important to make sure:
-
You aren’t running a blind eye to your usage limits
-
You can optimize the different AI models to help you still accomplish what you need
-
What habits should you start implementing
If you run a team or a business
-
What should you expect moving forward with usage limits
-
How can you best plan for any changes in your contracts with any tools that you use
-
How to pivot if you run into any hurdles when you are in a pinch
Imagine you are writing a proposal, preparing a slide deck, building a pitch, preparing a report for your senior leadership or client, and you (or someone on your team) is using an AI tool, and all of a sudden, you hit your usage limits for the current session, and even your weekly limit.
What do…you do?
You have a few options to consider (there are others, but just to name a few):
-
Pay for additional usage
-
Wait until your limits reset during the day or week
-
Use someone else’s AI tool
-
Actually, do all the work without the support of AI
We are going to break this into 5 different areas where you can drop your overall token usage and operational costs.
-
Stop uploading PDFs
-
Prevent Conversation Sprawls
-
Audit Your “Silent Taxes” (Plugins & Connectors)
-
Adopt “Two-Mode” Thinking: Research vs. Execution
-
Match the Model to the Mission
Want to check out some of my other recent articles here:
How to Build an Interactive Intelligence Dashboard in Claude
Stop Chasing Neutral Stakeholders: Find Your Real Champions and Identify Your Blockers
Multi-Threading and Account Penetration: Build Unbreakable Customer Relationships
What to Do – 5 Recommendations
1. Stop Uploading PDFs
One of the most common mistakes beginners make is uploading raw PDFs or images into a chat. These files contain hidden formatting overhead, such as headers, footers, embedded fonts, and layout metadata, which are encoded as tokens. A document with only 4,500 words can balloon into 100,000+ tokens if processed as a raw PDF
The Fix:
-
Convert the file to a Markdown (.md) file before uploading to your AI tool
-
This can save up to 20x on your costs on your memory and token usage
-
Use a tool such as Pandoc (free!)
-
2. Prevent “Conversation Sprawl”
AI models are not designed to handle extended conversations that last 30 or 40 turns. Because LLMs read the entire conversation history every time you send a new message, long threads quickly fill up the context window with junk and can lead to “LLM psychosis,” where the model begins to drift or lose track of the original instructions.
The Fix:
-
Start a fresh conversation every 10 to 15 turns.
-
If you have an evolving project, ask the AI to summarize the progress, copy that summary, and paste it into a new chat to keep the context window lean and focused.
3. Audit Your “Silent Taxes” (Plugins & Connectors)
Many users “hoard” plugins and connectors (like Google Drive, Slack, Notion, or web search tools) without realizing they often load data into the context window before you even type your first word. Some users are unknowingly paying a “tax” of over 50,000 tokens per session just to have these tools active, and there is no reason to have any of these tools connected at all for the specific task that you are using.
The Fix:
-
Regularly audit your active plugins
-
If you aren’t using a specific connector for a task, disable it to prevent it from acting like a “sitting duck” that slows down the model and wastes your token limit
4. Adopt “Two-Mode” Thinking: Research vs. Execution
Burning tokens often happens when users try to mix information gathering with actual work in the same thread. This confuses the AI and bloats the history with search results you may no longer need for the final output.
The Fix:
-
Separate your workflow into two distinct modes.
-
Use one mode (and potentially cheaper, search-optimized tools like Perplexity or Grok) to gather and refine information.
-
I use Perplexity for my search-optimization, and Claude for other tasks
-
Once you have the necessary context, move to a clean session in your primary reasoning model to focus solely on getting the work done
5. Match the Model to the Mission
Using the most expensive, high-reasoning model (like Claude Opus) for every single task could be described as “overdressed for the occasion.”
The Fix:
-
Use a tiered approach to model selection.
-
Utilize top-tier models for reasoning and complex thinking, but switch to mid-tier models for execution and smaller, faster models for tasks like proofreading or formatting.
-
This strategy can lead to a significant reduction in compute costs for the exact same result
Your home screen should look something like this (if you are using Claude)

|
Model |
In Basic English |
Cost Breakdown |
|---|---|---|
|
Opus 4.6 |
Use this model for high-stakes reasoning, multi-step logic, and complex synthesis. It is designed for deep analytical tasks, such as auditing high-value contracts, proposal prep, synthesizing disparate market intelligence reports, or architecting long-term growth strategies |
The most powerful but carries the highest “token tax” (cost and latency). Reserve it for creative, analytical, and research work where precision and deep thinking are the primary requirements |
|
Sonnet 4.6 |
Your engine for execution and production workflows. It provides a near-perfect balance of intelligence and speed, making it the ideal choice for 80% of your business development needs, such as drafting personalized outreach at scale, managing CRM integrations, or executing standard operational tasks |
It is far more cost-efficient than Opus while remaining smart enough to handle the vast majority of professional business interactions. It can be a primary tool for meeting most of your needs |
|
Haiku 4.5 |
Purpose-built for speed, polish, and high-volume tactical tasks. It excels at summarizing sales call transcripts, classifying inbound leads, routing inquiries, and basic formatting or proofreading |
Use for low-complexity tasks to prevent wasteful spending.” Haiku ensures your “ambient compute” costs remain low while keeping your projects and opportunities moving forward. |
Recommendations
Here are a few additional steps to look at on top of the 5 from above.
-
Claude (Pro) usage limits aren’t fixed; they fluctuate based on user demand. If you’re working on a complex project with attachments, you’ll burn through your “allowance” very quickly.
-
Check out what “Peak” vs “Off-Peak” hours are here (Claude specific): https://pforret.github.io/PeakClaude/schedule/
-
-
Wondering how costly tokens might be for you based on what you are looking to achieve? Here is a free token calculator:
-
Start thinking and planning for how usage of additional tokens and costs are going to be part of your overall marketing/operations/proposal budgets moving forward, when demand increases. Look at thresholds that you and your company are willing to meet if you burn through your usage.
AI is only as affordable as your habits are disciplined. The ability to manage and navigate tokens efficiently and effectively will become a very valuable job skill set.
The “silent taxes” of raw PDFs, bloated conversation histories, and unnecessary plugins are no longer just minor inconveniences; they are scaling costs that can differentiate a $250 monthly operation from a $2,000+ one.
You can shift to a “clean” workflow using the 5 recommendations above. You don’t want to be in the middle of a proposal and then hit a wall, stop, and have to change up how you continue with finishing your proposal, your workflow, and your overall output. Refine your process.
Stop looking at your AI interactions as a “guessing game” and start treating token management as essential infrastructure. Refine your habits now so that when the next wave of models arrives, you are prepared to leverage it to its fullest potential without “burning” your budget on the basics.
See you next week.
Whenever You’re Ready, Here are 4 Ways I Can Help You:
-
Unlocking Hidden Potential – Reconnecting with Past Clients for Explosive Growth – Check out my free eBook on how you can find hidden gems in your past clients and help you crush your sales goals.
-
AI for Business Development – Download our free eBook on how you can effectively leverage AI prompts to your advantage. From properly setting up your preferred AI tool, to how to shape your prompts, save time, and get the outputs you are looking for.
-
Sales Resources at Your Fingertips – From tools, tips, demos, and how-tos, check out our Pages and content that can provide you with additional support, whether it be social selling, account management, or something else.
-
Cribworks Advisor Program – Want more than just resources? Reach out to me and see if our Advisor Program can help you scale your business.
