Playbook: Triaging an Escalation-Error Spike (New Relic + Hex) Follow
Audience: L1 / L2 support agents — comfortable in New Relic and Hex, not developers.
Use when: you see (or get a report of) an escalation error and need to answer four questions fast: Is this new? How bad is it? Who's affected? Is it a software problem or a menu-setup problem?
This is a repeatable method, not a fix for one specific error. The error you're chasing will change; the steps don't.
You'll need: access to New Relic Logs and Hex, and one example session's escalation reason to start from.
Step 1 — Get the error "signature"
Every escalation reason is part fixed wording (the same in every session) and part details that change (item names, store, numbers). Search on only the fixed wording. If you include a detail that changes, your search will miss most of the matching sessions.
Where to find the escalation reason: the session viewer or a session export — it's the RAW_ESCALATION_REASON field (the "AI Escalation (...)" text).
Worked example — a readback error reason looks like this:
AI Escalation (Readback render failed for non-empty cart (2 items ['fries',
'4 for $4 wednesday chili dogs special']): Readback text contains
non-speech-renderable character '$' at position 0: '$4 wednesday special...')| Ignore — changes every session | Use — same in every session |
|---|---|
| the item count and item list | Readback render failed for non-empty cart |
| the specific character ('$') | Readback text contains non-speech-renderable character |
at position 0 | |
| the spoken text ('$4 wednesday special...') |
→ Search on: Readback text contains non-speech-renderable character — one search catches every version ($, -, /). Separate them out later in Hex. Search broad, break down narrow.
How to tell the two apart: the details that change are usually inside quotes, brackets, or parentheses, or are numbers (item names, quantities, store codes, characters, positions, prices). The plain-English sentence between them is the fixed wording — that's what you search on.
Common errors and what to search for:
| Error | Search for this text |
|---|---|
| Readback rejected a character | Readback text contains non-speech-renderable character |
| Readback came out empty | but no readback generated |
| Item not on the menu | NOT_ON_MENU |
| Order didn't reach the POS | ENOPOSRESPONSE |
| POS timed out | POS + timeout (or the exact POSTIMEOUT token) |
⚠️ Always copy the wording from a real session, not from memory — the search only matches if the spelling, spacing, and punctuation are exact.
Step 2 — New Relic Logs: is it new, and is it getting worse?
Search the fixed wording and look at the count over time at a few different ranges (today vs. the last 7 days vs. 30 days).
Use NR here for timing only — "did this just start / is it spiking right now." The count is log lines, not sessions (one failure is logged many times), so don't treat it as a real volume. For real counts and the breakdown, use Hex (Step 3).
- A line that sat flat and then spikes today → something a recent software update broke.
- A line that's been slowly climbing → a menu-setup problem building up over time.
Paste this into the NR query bar (swap in your search text):
SELECT count(*) FROM Log
WHERE message LIKE '%Readback text contains non-speech-renderable character%'
TIMESERIES SINCE 7 days ago- When did it start? Find where the line lifts off zero — that usually lines up with a software release.
- Is it still climbing today, or has it leveled off?
- Add
COMPARE WITH 1 week agoto check today against the same day last week.
Step 3 — Hex: how many different versions, and who's affected?
Use Hex for the breakdown — not New Relic. In NR the store, brand, and offending character are buried inside the free-text log message (there are no clean fields to group on), so this has to come from Hex, where there's one structured row per session.
Pull every session with the error for the spike window and break it down two ways:
- The different versions of the message (e.g. the $, -, and / cases) — shows how many separate problems are hiding under one error.
- By brand and store — shows how widespread it is.
Example ask to Hex: "Find all sessions today where the escalation reason contains 'Readback render failed'. Give me the total count, the different error messages, and a breakdown by brand and store. Compare today's count to the prior 7 days."
Step 4 — Decide: software problem or menu-setup problem?
Use how widespread it is, plus when it started:
| What you see | Most likely cause | Who to route to |
|---|---|---|
| Started suddenly, many stores, more than 1 brand | A recent software change broke it | Engineering / incident |
| Builds gradually, one store or one item | A menu-setup (MOT) problem | MOT ticket |
| Started suddenly, one brand only | A brand menu change or brand-specific release | Engineering + MOT |
Step 5 — Write it down
Put the numbers in the incident ticket: total count, number of stores, when it started, the different versions, and the trend.
Comments
0 comments
Please sign in to leave a comment.