"ChatGPT Wrote That" - Hedging Against AI Failures
- Caleb Robey
- Apr 24, 2023
- 3 min read
Updated: Mar 12

A thesis about LLMs pushing good innovators toward excellence and mediocre ones toward risk-mongering.
A "new" Ubiquitous Issue
Not too long ago, I was doing a code review and a small chunk of code caught my eye.
The code repeated an extremely memory and time-intensive computation hundreds of times doing the exact same thing every time, simply to keep the last result of all of those runs and throw out all the others.
When I asked the engineer about the code chunk, I got a small glimpse into what I expect will be a ubiquitous code-review conversation from here on out:
"Oh, Chat GPT did that," he responded. "I didn't really know what it was doing, so I left it in there."
I don't mean to be hard on this particular situation or person. That's a tendency that we all have - quick answers, automated for us, seem (and often are) wonderful. Why inspect the answer if it seems to work? I face this temptation constantly and will likely fall into this trap more than once in the coming years.
Root Cause
In response to programming questions, LLMs are right a lot. I continue to be blown away by the accuracy and thoroughness with which they provide answers.
LLMs are also wrong quite a bit. And in weird ways. Ways a human virtually would never be wrong (like just giving the same answer over and over again when told directly to give a new answer). That makes the error profoundly difficult both to detect and certainly to debug.
Hedge #1: System Safeguards
The first step in mitigating these failures has to do with vetting AI systems and carefully employing them in your product development cycle.
First, employ an adversarial form of RAG in the use of LLMs. You can read more about that here, but the basic idea is that you give the LLM a set of information that creates guardrails around what an LLM would allowably say. The dataset could be research literature on a specific topic, publications, and reputable databases of information.
Furthermore, there are companies like Prediction Guard who help prevent hallucinations, institute governance, and ensure compliance by using RAG, prompt-engineering, and various other safeguards.
Finally, for non-LLM AI systems, make sure that there on controls that users can put on model output. Are you generating optimal temperature conditions for a reaction? Be sure that your system enables your team to set controls on those predictions that are reasonable. A reaction at 1000 degrees Kelvin may seem like a good experiment to try from an AI model's perspective, but in practice could leave your lab as an ash heap.
Hedge #2: Continue to Love Expertise
As LLMs get better, knowing the difference between good and bad, accurate and inaccurate output, will require deeper expertise. In the world of generative AI, experts and novices alike will produce products with incredible speed. However, one set of those products will break regularly, break strangely, and take unending hours to fix.
Whether you are an engineer growing up in the industry right now or the director of R&D at a biotech company, beware of skipping expertise. Generative AI promises the world, but it is just a tool. If you attempt to see it as a substitute for needing to learn, iterate, and grow, you will find yourself building products that are at best deeply suboptimal and at worst catastrophic failures.
Pair AI with careful expertise, and you may find yourself building products that are more reliable and capable than you ever could before.
Comments