The Startup Breakdown
Posts
Hamming: The Guardian of AI - Policing LLMs for Reliable Outputs↕️

Hamming: The Guardian of AI - Policing LLMs for Reliable Outputs↕️

AI-Powered Evaluation Platform Ensures Accuracy and Reliability in LLM Outputs

Trey Layton
May 14, 2024

In partnership with

This is The Startup Breakdown, the only newsletter looking into the new Y Combinator companies that you need to know. By joining this growing community of hundreds of founders, investors, and other startup aficionados (think i spelled that right?), you're getting a firsthand look at the companies that may be the next Airbnb, Coinbase, or Reddit.

If you'd like to receive these newsletters directly in your inbox once a week, hit subscribe and never miss an email!

Upgrade to a premium membership to read Thursday’s newsletter!

Premium subscribers include investors and YC-backed founders looking to stay ahead of the latest startup news.

Hamming: The AI Evaluation Platform Keeping LLMs in Check

BACKGROUND:

We have our first member of the summer cohort!

There are still a couple of weeks of interviews and invitations, but YC has gone ahead and wired the $125K SAFE to Hamming, an LLM to police other LLMs. About as meta as it gets.

Side note, thinking about creating a resource further explaining terminology, the YC structure, how to think about the math, etc. Would you be interested? Anyone who replies will get it for sure, even if I don’t end up making it an official subscriber resource

If you’ve used ChatGPT, Claude, Gemini, or any other AI tool, you’ve probably seen the “our model might hallucinate or provide incorrect information. always fact check” disclaimers.

If you’ve used these tools enough, you’ve probably even experienced this hallucination yourself, like the time that Google’s Gemini told me my friend that I should drink 2 quarts of my his own urine to quickly pass kidney stones.

As more companies build their own LLMs, whether to compete with OpenAI or to use internally with proprietary data, the importance of AI evaluation and monitoring these outputs will only become more and more important.

Hamming is building the playground for AI evaluation, allowing developers and engineers to experiment with prompts, models, and architecture in a fraction of the time and cost.

NUMBERS:

Traction:

Customers from industries like legal, medical, financial services, and more
Names include Fora, Inkly, and Intuitive Systems

Market Size:

TAM: $100 billion, growing at a nearly 40% CAGR
SAM: $20 billion
SOM: $100 million

Competition:

Hugging Face
Vellum AI, Scale AI, Mosaic ML, Comet ML, Arthur AI

Team:

Marius Buleandra, CTO: Engineering Manager at Anduril, founding engineer at Spell (acquired by Reddit)
Sumanyu Sharma, CEO: Head of Data at Citizen, Senior Staff Data Scientist at Tesla

Gif by buzzfeed on Giphy

Risks:

LLMs to grade LLMs: There are genuine cost and time benefits to using AI to police AI, but though it might further reduce the already small error rate, their own LLM is still susceptible to the same hallucinations that they’re attempting to prevent for others, and no matter how successful their model is, they’ll always face the obstacle of associations
Evaluation model: Though there are existing benchmarks and tests meant to evaluate models, there have been critiques of the methods as they’re often arbitrary and don’t necessarily test the true “intelligence” of models
Regulation: The government has been increasingly sceptical of AI and has already explored ways to monitor and evaluate companies’ LLMs themselves, and there’s a chance Uncle Sam decides to get his greasy paws all over a company like Hamming that is doing this work

What I like:

Cross-role: Hamming's tools are beneficial to just about every position in the firm, not just engineers, making the user experience great and the appeal to enterprises even more attractive as they have so many potential beneficiaries of a contract for AI evaluation services
Industry-agnostic: Similarly, the company already counts industries across the spectrum as customers, making the market for their services even greater
Massive market: The market is huge, and should the cost and time benefits continue to hold (or even become more prominent) and the error rate is proven to be lower, this company could genuinely count every business in the country (and beyond) as a customer

Giphy

Opportunities:

Become the standard evaluation benchmark: As mentioned, there is no universal AI evaluation system, as each of the evaluation services and research teams has its own benchmarks and criteria. Even if Hamming just develops the state of the art grader, they will stay around. Perhaps launch a free version to get the name out there?"
Consulting: Arguably the quickest way to expand their potential customer pool is through helping more and more companies to learn to implement AI and the various use cases they can find for it. Becoming a consultant for these companies, and publishing case studies for inspiration, is another area for growth, both directly and indirectly
Partnerships with LLM providers: Though it’s likely to take a bit more traction and proof of product, shaking hands with Sam Altman or Sundar Pichai to agree on letting these companies integrate Hamming’s evaluation tools into the companies’ LLMs would be a massive stamp of approval and one of those oversized checks that you get when you win a raffle

Hamming has the opportunity to be one of the most important infrastructure providers in the entire AI space. If it can continue to be cheaper, faster, and most importantly, more reliable, than human grading, their only ceiling is the number of businesses using AI, and that number should only continue to skyrocket.

You heard it here first, folks. Hamming is here to stay.

Links of the week:

Share this newsletter with one friend to get access to the Links of the Week section in every newsletter, including today’s! 👇️

Right after I said that OpenAI might be releasing a search engine, Sam Altman took to Twitter to tell people that OpenAI is not releasing a search engine… However, their newly announced model is still worth getting the sweats for 😃
- OpenAI introduced the GPT-4o model
- Twice as fast, and twice as cheap as GPT-4
- Improves image, video, and audio capabilities (image immediately available, others rolling out over the next few months)
- Interestingly, the model appears to have massively upgraded coding capabilities
- Everyone wanted GPT-5, but it appears that instead, we’re more likely to see releases categorized more as incremental improvements as OpenAI chooses the “personal assistant” route

🚨OPENAI: NEW GPT-4o CRUSHES COMPLEX CODING PROBLEMS
GPT-4o reportedly significantly outperformed previous models in tackling difficult coding and debugging tasks.
William Fedus, OpenAI:
"GPT-4o is our new state-of-the-art frontier model. We’ve been testing a version on the… x.com/i/web/status/1…
— Mario Nawfal (@MarioNawfal)
6:08 PM • May 13, 2024

As has been anticipated, the US is imposing a 100% tariff on Chinese EVs
- The champion of the free market is once again anything but when it potentially interferes with reelection bids…
- In short, here’s what would happen if these vehicles were allowed in the states:
  - They’re much cheaper than American cars, even gas ones, and would quickly dominate the market.
  - American car manufacturers would be crushed.
  - Those in this industry would vote against Biden because he was in office when it happened.
  - Biden would lose the election because these voters are mostly in the swing states in the Rust Belt.
  - American companies would finally have to focus on building things instead of sHaReHoLdEr VaLuE because they’d finally be incentivized to invest in R&D instead of stock buybacks
- I will say that there’s a legitimate gripe to the fact that many of these cars come with Chinese software, and it’s been proven time and time again that the CCP loves to weaponize this to spy on Americans, but at the end of the day, this is just another example of the esteemed American democracy ensuring that nobody really wins with this flawed architecture of incentivization
Netflix has entered live sports despite CEO Reed Hastings continually stating that they weren’t interested. If only someone had called bullshit on his public comments more than a year ago…
- After organizing this summer’s Mike Tyson v. Jake Paul fight this summer, the streamer is looking to acquire exclusive rights to two NFL Christmas games
- Amazon caught heat when they acquired exclusive rights to Thursday Night Football, and Max has been flexing their TNT rights through live basketball, hockey, and soccer, but it’s becoming increasingly clear: streaming and live sports will go hand in hand
- What makes the Netflix deal interesting is that the company is not affiliated with any existing sports networks, making it more cumbersome to obtain these rights… would they consider shelling out for permanent rights?
Was having some challenges with Facebook Ads, and unfortunately, they have quite possibly the worst customer service of any company I have ever come across. This subreddit has been particularly valuable for me lately, and this post in particular is worth a read if you’re actively running campaigns
An open-source video editor to keep an eyes on if you use video tools and are tired of pricing

Learn how to make AI work for you.

AI breakthroughs happen every day. But where do you learn to actually apply the tech to your work? Join The Rundown — the world’s largest AI newsletter read by over 600,000 early adopters staying ahead of the curve.

The Rundown’s expert research team spends all day learning what’s new in AI
They send you daily emails on impactful AI tools and how to apply it
You learn how to become 2x more productive by leveraging AI

Subscribe with one click.

Last word 👋

How am I doing?

I love hearing from readers, and I’m always looking for feedback. How am I doing with The Startup Breakdown? Is there anything you’d like to see more or less of? Which aspects of the newsletter do you like most?

Cheers to another day,

Reply

or to participate.