AI, Hazmat, and the Confident Answer That Can Get You Hurt
We have a bad habit in hazmat of trusting anything that sounds organized. A clean answer feels useful. A fast answer feels even better. Put it in a neat sentence with the right technical words, and suddenly it starts to feel like guidance.
That is where artificial intelligence becomes both impressive and dangerous.
AI does not have to be malicious to create a problem. It only has to be confidently incomplete. It only has to give an answer that sounds good enough for someone under stress to believe. In a hazmat scene, that is a serious issue because we rarely ask simple questions under simple conditions. We are asking complicated questions while standing in bad weather, around unknown products, with limited time, incomplete information, and people waiting for us to make a decision.
That was the real value of this discussion. It was not just about whether AI can help responders. It was about understanding where it fails, why it fails, and why the human using it has to remain smarter than the tool.
Garbage In, Garbage Out
The first problem is the oldest problem in technical reference: bad input creates bad output.
Take a box truck rollover involving retail pool supplies. There is a strong chlorine smell. The truck is leaking. Fuel and antifreeze may also be involved. Somebody asks AI what PPE and isolation distance to use for “leaking pool chemicals and chlorine smell.”
The answer comes back as useful: Level B protection with SCBA, isolate to 120 feet, and avoid water contact due to reactivity.
That sounds official enough to be tempting.
The problem is that “pool chemicals” is not a chemical identity. It is a shopping aisle. That phrase could mean calcium hypochlorite, sodium dichloroisocyanurate, trichlor, muriatic acid, sodium bisulfate, algaecides, oxidizers, acids, bases, or some ugly combination of all of them. Some are solids. Some are liquids. Some are oxidizers. Some can generate chlorine when mixed with acids. Some can react badly with water. Some may not be the primary problem at all.
So when the input is vague, the output has no solid ground to stand on. AI may recognize the words “pool chemicals” and “chlorine smell,” but it does not truly know what is leaking, what is mixing, what is reacting, or what the vapor hazard actually is.
That matters because PPE and isolation are not guesses. Under OSHA 29 CFR 1910.120, responders are expected to base protective actions on hazard assessment, product identification, exposure potential, and site conditions. The same logic runs through NFPA 470. We are not supposed to pick a suit level because a sentence sounded plausible.
The better question would include the actual product names, container types, weather, terrain, visible reactions, meter readings, wind direction, runoff conditions, and whether the chlorine odor appears to be from a product release or from a chemical reaction.
That is not overexplaining. That is incident context.
And AI needs context badly.
Asking Too Soon Gets You Almost Nothing
There is also a timing problem. Responders may be reaching for AI before they have enough information to ask a meaningful question.
A leaking 55-gallon drum in a warehouse is a good example. No label. Slight vapor. Workers reporting dizziness. The question becomes, “Unknown chemical leaking with vapor causing dizziness. What should I do?”
AI gives the kind of answer you would expect: evacuate the area, use air monitoring, and wear appropriate PPE.
That is not wrong. It is just not enough.
It is the kind of answer that lives at the level of awareness. It tells you to be careful, back people out, and gather more information. Fine. But if you are operating as a hazmat technician, that answer does not provide a metering strategy, toxicological considerations, vapor behavior, likely chemical families, entry considerations, or a decision-making framework.
That is where the user has to force the tool to work harder. Instead of asking, “What should I do?” the better approach is to ask for a metering plan, likely hazard classes, possible routes of exposure, PPE considerations, isolation logic, and the assumptions behind each recommendation.
That last part matters most. AI should not just give an answer. It should explain why it is giving that answer.
If it cannot explain the reasoning, or if the reasoning is built on weak assumptions, the answer should not survive contact with the incident action plan.
The Railcar Problem
Scale is another place where AI can fall apart.
A gasoline railcar derailment is not the same thing as a small gasoline spill. That sounds obvious to every hazmat technician, but it is not always obvious in an AI response.
If the situation is a railcar derailment with a UN1203 placard, strong odor, and a vapor cloud forming, asking “What is the safe distance for a gasoline rail car leak?” may produce a dangerously narrow answer. In one example, AI responded with 150 feet in all directions.
That is the kind of answer that should make your stomach tighten.
Gasoline is flammable, volatile, and produces vapors that travel and collect in low areas. A railcar can carry a massive amount of product. Weather, temperature, wind, terrain, drainage, ignition sources, and vapor cloud behavior all matter. A visible vapor cloud is not a minor detail. It suggests an active atmospheric hazard that warrants respect.
The likely failure here is that AI grabbed a simple reference point and didn’t scale it to the incident. It saw gasoline. It may have found small-spill guidance. Then it gave a small-scale answer to a large-scale rail problem.
That is not a small mistake. That is the difference between reading a guide and understanding an incident.
This is why the Emergency Response Guidebook is a starting point, not the end of the conversation. It provides initial guidance on isolation and protective action, but the responder still has to evaluate the actual conditions. A railcar, a roadway spill, a fixed facility release, and a leaking five-gallon can may share a product name, but they do not share the same risk profile.
AI often recognizes the product faster than it recognizes the problem. That is a serious limitation.
The Memory Problem Nobody Sees
One of the more unsettling issues is that AI may bring old information into a new conversation.
That sounds helpful when you are writing emails, planning a trip, or organizing a project. It becomes a different matter when you are using the tool for emergency response thinking.
If an AI system remembers prior conversations, it may accidentally drag in details from an old incident into the current one. Maybe a previous gasoline release happened in the rain. Maybe a prior discussion involved a specific facility, a dog, a roadway, a product, or a weather condition. If that old context gets blended into a new incident, the response may start making assumptions that were never true for the current scene.
That is dangerous because responders already fight cognitive bias. We anchor on early information. We look for details that confirm what we already believe. We sometimes carry the last similar run into the next one without realizing it.
AI can reinforce that mistake. Instead of challenging our assumptions, it may quietly add its own.
That means teams using AI for technical reference should understand the settings behind the platform. If memory or historical context is active, it may need to be disabled for incident-specific work. Every incident should be treated as a clean event unless the user deliberately adds previous information because it is relevant.
The machine should not be allowed to decide that on its own.
Lithium-Ion Fires and the Danger of Internet Consensus
Lithium-ion battery fires show another major weakness: AI can confuse popularity with truth. Ask how to extinguish a lithium-ion battery fire, and AI may answer with “large amounts of water or Class D extinguishing agent.”
Part of that answer is useful. Water is commonly used for cooling, protection against exposure, and to slow or limit propagation in lithium-ion battery events. The goal is not just flame knockdown. It is heat removal. It is stopping the battery pack from continuing to drive thermal runaway from cell to cell.
The Class D part is where the problem begins. Class D agents are for combustible metal fires. Elemental lithium is a metal. Lithium-ion batteries, however, are not simply blocks of elemental lithium burning like a classic metal fire. They contain lithium salts, electrolytes, separators, anodes, cathodes, and complex cell construction. Thermal runaway is a chemical and thermal failure process, not just a metal fire problem.
That distinction matters operationally. If responders hear “lithium” and think “Class D,” they may be applying the wrong mental model to the wrong hazard. A battery pack fire involves propagation, stranded energy, reignition potential, toxic and flammable off-gassing, and long-duration cooling concerns. A short, generic AI answer can flatten all of that into something dangerously simple.
The reason is partly the information environment AI learns from. Online discussions are full of confident people repeating bad tactics. If enough people say the same wrong thing, AI may treat that pattern as reliable. It does not necessarily know that experienced battery instructors, researchers, and field operators would reject the advice.
That is the problem with consensus scraped from the internet. Consensus is not competence.
AI Is Not a Hazmat Technician
AI can still be useful. That is important to say clearly. It can help organize thoughts. It can compare references. It can generate checklists. It can help frame a metering strategy. It can remind a technician of product families, likely incompatibilities, or regulatory considerations. It can be a useful second look when the user knows enough to challenge the answer.
But it is not a hazmat technician. It cannot smell chlorine. It cannot see vapor moving across the pavement. It cannot hear a pressure relief device. It cannot feel heat coming off a battery pack. It cannot watch a crew getting overloaded and realize the operation is drifting. It cannot be held responsible for the decision.
The user does. That is why AI should be treated like a junior technical aide with access to a huge library and no field judgment. Make it explain itself. Make it list its assumptions. Make it identify what information is missing. Make it separate confirmed facts from possibilities. Then verify the answer against meters, references, SOPs, and trained human judgment.
The danger is not that responders will use AI. They already are. The danger is that they will use it casually, quickly, and without understanding how easily it can be wrong.
So train with it before you trust it. Feed it bad prompts and watch it fail. Ask vague questions and see how generic the answers become. Ask specific questions and compare the differences. Run the same incident through multiple tools. Challenge the outputs. Teach your team where the machine helps and where it starts making things up with a straight face.
Because on a hazmat scene, the worst answer is not always the one that sounds ridiculous. Sometimes the worst answer is the one that sounds just good enough to follow.
