Paolo Perrone — Shipping Production AI: Agents, Inference, GPU. Read by 1M+ AI engineers.
Shipping Production AI: Agents, Inference, GPU. Read by 1M+ AI engineers.
Paolo Perrone ranks #160 of 14,983 LinkedIn creators in Computer Software, and is a standout voice in United States. They have 131.2K followers and published 50 posts in the last 30 days at a 0.1% average engagement rate.
- 131.2K followers
- 50 posts / 30d
- 0.1% avg engagement
- — follower growth / 30d
The roast
Paolo claims he writes about how production AI actually works, yet he’s spent 50 posts in the last month explaining the industry to 131,000 people who are clearly only following him to see how much faster a career can be vaporized by an engagement rate of 0.15%.
About Paolo
Get your AI product in front of 1M+ engineers buying AI infrastructure, agent tooling, and inference stacks. NVIDIA, Google, LangChain, CodeRabbit already have. 📨 https://tally.so/r/wk69VJ (or shoot me a DM) ✌️I write about how production AI actually works.Agents, inference, GPUs, retrieval, evals,the parts of shipping AI systems that don't make it into the paper. The latency you hit at batch size 32. The agent loop that breaks at turn 7. The retrieval that's 90% recall on synthetic data and 40% in production.What I cover:→ Agents: orchestration, memory, planning loops, where they break→ Inference: latency, throughput, batching, serving (vLLM, TGI, TensorRT)→ GPUs: CUDA, kernels, model parallelism, what's fast on H100/B200→ Retrieval: vector DBs, hybrid search, chunking strategies→ Evals: measuring model performance when there's no ground truthDistribution:→ LinkedIn: 130K+ AI/ML engineers→ Medium: 922K+ via Data Science Collective (Founding Editor)→ The AI Engineer newsletter: 25K+ AI engineers getting dangerously good at AI→ The Tech Audience Accelerator: 14K+ AI/ML founders building audiences and distributionBackground: → 8+ years shipping ML systems in production. → Previously Head of Data Science at Prediktiva. → Founder of The Tech Content Agency since 2023. → 100+ sponsors, solo, 100% running on the agent infrastructure I built.For AI/ML founders: No one funds the founder no one's heard of. I build the LinkedIn presence that pulls in investors, customers, and senior hires. 3-4 founders at a time → https://tally.so/r/wk69VJ (or shoot me a DM) ✌️For AI/ML companies: 1M+ engineers read my work. NVIDIA, Google, LangChain, and 100+ AI/ML companies sponsor to reach them. Sponsored content, newsletter placements, multi-channel bundles. → https://tally.so/r/wk69VJ (or shoot me a DM) ✌️
Highlights
- Big Audience — 131,229 followers · top 1%
- Top 1% in Computer Software — Ranked #26 of 4267 creators
- Top 5% in United States — Ranked #56 of 5205 creators
- Consistent Creator — 50 posts in 30d · top 5%
Recent posts
I looked at AI spending across dozens of companies. The pattern was stark. The more they spent, the worse their results. 𝗖𝗼𝗺𝗽𝗮𝗻𝗶𝗲𝘀 𝗯𝘂𝗿𝗻𝗶𝗻𝗴 $𝟱𝟬𝗞-$𝟱𝟬𝟬𝗞/𝗺𝗼𝗻𝘁𝗵: - "AI platform" licenses nobody logs into - Consultants building things ops never wanted - Infrastructure for scale that never comes - Beautiful strategy decks. Zero problems solved. 𝗖𝗼𝗺𝗽𝗮𝗻𝗶𝗲𝘀 𝘄𝗶𝗻𝗻𝗶𝗻𝗴 𝗮𝘁 $𝟮𝗞-$𝟱𝗞/𝗺𝗼𝗻𝘁𝗵: - Just API calls: Claude, GPT, Gemini. - One person who knows where the pain is - Simple tools like Zapier, Make, n8n. - Off-the-shelf tools, proprietary workflows
11 reactions · 3 comments · 0 reposts
CUDA takes years to learn. Triton takes an afternoon. OpenAI open-sourced it. Here's what you should know: The problem: CUDA requires you to manage: → Memory coalescing (manual) → Shared memory allocation (manual) → Thread synchronization (manual) → Bank conflict avoidance (manual) Most ML engineers never touch it. Too hard. Too slow. The fix: Triton handles it for you. → Memory coalescing: Automatic → Shared memory: Automatic → SM scheduling: Automatic No threadIdx. No barriers. No bank conflicts. Python in. GPU code out. The numbers: → 25 lines matching cuBLAS performance → Fused
50 reactions · 3 comments · 0 reposts
Codex is 2x cheaper per token. It also uses 4x more tool calls. I ran three tasks through both. Same DAG, same prompts, same starting branch. Task 1: Claude 31 calls. Codex 112. Task 2: Claude 58 calls. Codex 211. Task 3: Claude 64 calls. Codex 248. Same green diffs. Same test suites passing. I assumed it was a context gap. Codex wasn't reading AGENTS.md. Fixed the mirror. Reran. 203 calls vs 211. Not a setup issue. Just how each model works: Codex reads-greps-reads-writes-reads. Claude reads less, writes less, ships cleaner. Neither is wrong. But one costs more per tool call. On
23 reactions · 2 comments · 0 reposts