Jan 01, 2026 | 14 Mins read

Building Chatbots with Intent Detection: Guide

Chatbots rely on intent detection to classify user queries and respond effectively. By identifying user goals (intents) and extracting specific details (entities), businesses can automate tasks like order tracking, account updates, and automated ticket routing. Here’s what you need to know:

  • Intent Detection Basics: Classify user queries into intents (e.g., "Order Status") and identify entities (e.g., order IDs). This improves chatbot accuracy beyond basic keyword matching.

  • Organizing Intents: Start with 30–40 intents, expand to 60–80, and group them into categories like "Account Management" or "Billing." Use a fallback intent for unmatched queries.

  • Training Data: Use real user interactions, provide 80–100 diverse examples per intent, and ensure consistent phrasing. Avoid splitting similar intents unnecessarily.

  • Model Setup: Use clear prompts, few-shot learning, and structured outputs (e.g., JSON) for better automation. Limit defined intents in prompts to fewer than 20.

  • Performance Metrics: Measure accuracy, precision, recall, and F1-score. Aim for 90%+ accuracy and set confidence thresholds to handle ambiguous queries.

  • Tools: Platforms like IrisAgent simplify intent detection by automating ticket tagging, routing, and sentiment analysis while integrating with tools like Zendesk.

Building effective chatbots starts with a clear intent framework, quality training data, and ongoing refinement. This ensures faster resolutions and improved customer experiences.

Building Chatbots with Intent Detection: 5-Step Implementation Guide

building chatbots with intent detection - implementation guide

Defining Intents for Your Chatbot

How to Identify Key Intents

To pinpoint the most common customer inquiries, analyze data from support tickets, live chats, emails, and website search queries. Frontline staff can also provide valuable insights to help you compile a list of the top 10 customer questions.

Use Content Coverage Analysis (CCA) to identify recurring topics. Any query that accounts for 2% or more of overall content is worth noting. Differentiate between "meaningful" business-related queries (like "Order Status") and "structural" conversational fillers (such as "Greeting").

"A good intent structure is the foundation of a strong AI model." - Zendesk Team

When building your chatbot, start with 30 to 40 intents for an initial model. Advanced models typically expand to 60 to 80 intents. It's uncommon for a high-performing AI agent to have fewer than 30 or more than 100 intents. Always include a fallback intent (e.g., "Other" or "Unresolved") to handle queries that don’t match any predefined categories. Additionally, consider setting up "negative" intents to manage inappropriate or out-of-scope inputs effectively.

Once you’ve identified your intents, organize them into a structured hierarchy to improve clarity and scalability for your model.

Organizing Intents into Categories

Avoid overwhelming your system with a flat list of 50+ intents. Instead, group related intents into hierarchical categories. For example, "Reset Password" and "Update Email" can fall under the broader "Account Management" category. This approach not only simplifies the structure but also helps the NLU (Natural Language Understanding) engine perform better and makes debugging more manageable as your chatbot grows.

"Grouping around nouns will lead to much higher performance." - Benjamin Aronov, Developer Advocate, Vonage

When organizing intents, prioritize nouns (e.g., "Loans", "Insurance") over verbs (e.g., "Check", "Request"). Noun-based categories are less ambiguous and reduce overlap in training data. While verb-based grouping is possible, noun-based grouping ensures clearer, more distinct categories for your model. If multiple queries follow the same conversational flow, create a single "catch-all" intent, such as "Change Account Info", rather than separate intents for each action like updating a name, email, or address.

Stick to the "One Intent, One Job" rule - each intent should focus on a single, specific task. For instance, instead of a broad "Billing" intent, break it down into smaller, more precise intents like "Download Invoice" or "Update Payment Method." This approach minimizes ambiguity and boosts classification accuracy.

A well-organized intent structure lays the foundation for smoother training and better intent recognition.

Common Customer Support Intent Examples

Intent Category

Specific Intent Examples

Typical User Query

Order Management

Order Status, Order Tracking, Cancel Order

"Where is my stuff?"

Account Management

Reset Password, Update Email, Change Address

"I can't log in to my account."

Billing & Payments

Pay Bill, Update Payment Method, Query Charge

"Why was I charged twice?"

Product Support

Technical Support, Product Inquiry, Pricing

"Does this model come in blue?"

Returns & Refunds

Return Policy, Start Return, Refund Status

"How do I send this back?"

Structural

Greeting, Goodbye, Human Handoff

"I want to speak to a real person."

Fallback

Unresolved, Out-of-scope

"What is the meaning of life?"

For each intent, ensure you provide 80–100 diverse training utterances. Include a variety of sentence structures, such as questions and commands, along with synonyms, slang, and even common typos like "pasword reset". Be consistent with "filler" words - either include phrases like "I want to..." across all related intents or leave them out entirely. This prevents your model from overemphasizing these words during training.

Setting Up Intent Detection with Language Models

Writing Effective Prompts for Intent Detection

When creating prompts for intent detection, start by clearly defining the model's role and listing possible intent categories along with brief descriptions. Always include a fallback option like "Other" or "Unclear" to handle ambiguous queries. For example, an AI-powered assistant might classify user messages into predefined intents, assigning "Unclear" to messages it can't confidently interpret.

To boost accuracy, consider using few-shot learning. This involves providing 3–5 example pairs of "User Query → Intent" in your prompt, helping the model recognize patterns. For instance, Vellum developed a system prompt for an e-commerce chatbot that successfully routed messages like "I would like to check my last order" to the "Order Status" intent after testing it on 200 cases. Using clear structural markers - such as Markdown headers (# Instructions) or XML tags (<examples></examples>) - can also help the model understand the organization of your prompt.

"A reasoning model is like a senior co-worker. You can give them a goal to achieve and trust them to work out the details. A GPT model is like a junior coworker. They'll perform best with explicit instructions." – OpenAI Documentation

For best results, limit the number of defined intents to fewer than 20 in a single prompt. Additionally, set the temperature parameter close to 0 to ensure consistent and reliable predictions.

Once your prompts are well-designed, the next step is to structure the outputs effectively.

Using Function Calling for Structured Outputs

Function calling (sometimes referred to as tool calling) allows the model to produce structured JSON outputs instead of plain text, making the results immediately actionable. For example, the model might output something like this: {"intent": "Order_Status", "order_id": "12345"}. This approach not only identifies the user's intent but also extracts specific entities (like order IDs, dates, or locations) in one step.

To ensure valid outputs, define your intents as an enum within the function parameters. For example:

"intent": { "type": "string", "enum": ["Order_Status", "Product_Info", "Pricing", "Support", "Other"] }

When using function calling, enable Strict Mode by setting strict: true in your function definition. This ensures the output strictly adheres to the defined JSON schema. Providing detailed descriptions for each parameter can further guide the model toward the correct intent. In one experiment, this method eliminated the need for complex regex parsing, allowing the chatbot to automatically trigger specific API calls based on automated ticket tagging and routing.

"Function calling (also known as tool calling) provides a powerful and flexible way for OpenAI models to interface with external systems and access data outside their training data." – OpenAI

Keep in mind that function definitions are included in the system message, which counts toward the model's context limit and token usage. Additionally, if a user expresses multiple intents in a single message (e.g., "Check my order and update my email"), the model may trigger multiple functions in one turn.

With structured outputs established, the next step is refining the prompts to further improve accuracy.

Improving Accuracy with Prompt Engineering

Refining prompts can significantly enhance the model's accuracy. Use the developer/system role to provide overarching guidelines and the user role for specific queries, keeping high-level instructions separate from user input.

Set a confidence threshold to ensure the model returns "Unclear" for ambiguous queries. This avoids incorrect assumptions and allows uncertain messages to be escalated to human agents. To minimize security risks, always map the model's output to a predefined list of intents in your code rather than relying on raw responses.

Evaluate the effectiveness of your prompts using metrics like accuracy, precision, recall, and F1-score. For instance, in one experiment with a small labeled dataset and a Logistic Regression pipeline, the model achieved 80% overall accuracy. Models like GPT-4 perform better with precise and explicit instructions, so clarity is key.

If you’re dealing with a large number of intents, consider adopting a Retrieval Augmented Generation (RAG) approach. This method dynamically embeds user messages and injects similar examples in real time, scaling more effectively than hardcoding examples. Additionally, newer models like GPT-4.1 offer expanded context windows - ranging from 100,000 to one million tokens - enabling them to handle extensive context and multiple few-shot examples with ease.

Training, Testing, and Improving Your Intent Detection Model

Collecting and Labeling Training Data

The best training data comes from real user interactions, not artificial examples or templates created by developers. As Rasa explains:

"The best training data doesn't come from autogeneration tools or an off-the-shelf solution, it comes from real conversations that are specific to your users".

Testing your bot with external users is a great way to gather authentic messages. This includes capturing typos, slang, and unexpected phrasing that reflect how real users communicate. This data is also valuable for understanding customer sentiment to identify frustration or satisfaction trends.

When labeling your data, avoid splitting intents with similar goals. For instance, use a single "order" intent and rely on entities to capture specific details. To handle queries outside your bot’s scope, include an out_of_scope or unresolvedIntent category. This prevents the model from forcing incorrect classifications.

Divide your data into an 80/20 split for training and testing. Randomizing the data before splitting ensures patterns are evenly distributed. Training with just 2–3 example sentences per intent leads to poor results, but adding automatically generated variations can improve accuracy to at least 90%. Treat your training data like source code - use version control tools like GitHub to track changes and roll back if needed.

Measuring Model Performance

Once your data is labeled and the model is trained, it’s time to evaluate its performance. Use clear metrics to understand how well your intent detection model is working.

Here are four key metrics to focus on:

  • Accuracy: The percentage of test cases where the model correctly predicted the intent above your confidence threshold.

  • Precision: How many of the model’s predictions for a specific intent were actually correct.

  • Recall: The proportion of actual instances of an intent that the model successfully identified.

  • F1-Score: A balanced measure combining precision and recall, particularly helpful for datasets with uneven intent distributions.

Metric

What It Measures

When to Use It

Accuracy

Overall success rate of predictions

General health check of the model

Precision

Quality of positive predictions

Reducing false positives (incorrect triggers)

Recall

Ability to detect all relevant intents

Ensuring no user queries are missed

F1-Score

Balance between precision and recall

Evaluating models with unbalanced intent data

High-performing commercial models often achieve 90% or higher accuracy. Set a confidence threshold (usually 0.1 or above) to filter out low-certainty predictions. Pay attention to "unreliable" test cases - those where the model predicted correctly but with marginal confidence. These cases highlight areas that may need more training data.

Refining Your Model Over Time

Once you’ve measured performance, it’s important to keep refining your model to adapt to new patterns in user data.

Treat your training data like code - apply the same level of discipline and rigor. Automate regression testing as part of your CI/CD pipeline whenever you update training data or tweak hyperparameters. Tools like "Utterance Testers" can help you review real user messages, correct misclassifications, and add them to your training set.

Regularly review your intent schema. Address class imbalances with balanced batching and merge intents that are often confused. Keep a log of misclassified utterances to guide future improvements. For common entities like names, dates, and locations, pre-trained extractors like Spacy or Duckling can save you time on manual labeling.

Misspellings can be tricky. Instead of relying solely on spellcheckers, consider adding character-level featurizers (like character n-grams) to your NLU pipeline. Log failed utterances as test cases to improve future iterations. Keep in mind that intent classification F1 scores can vary slightly (by about 0.0042) when training on GPUs due to non-deterministic operations. Running multiple training iterations can help you establish a reliable baseline.

Using IrisAgent for No-Code Intent Detection

irisagent

IrisAgent Features for Chatbot Intent Detection

IrisAgent connects seamlessly with major ticketing systems to streamline intent detection. Powered by IrisGPT, it processes inquiries, automates responses, and conducts intelligent searches to enhance support efficiency.

The platform takes care of tasks like ticket tagging, routing, and sentiment analysis automatically. By learning from historical support tickets, IrisAgent's AI can suggest resolutions and speed up ticket closures. It also provides conversation summaries and AI-recommended macros, making it easier for agents to resolve issues. The system is designed to work across self-service channels, such as help centers, and integrates with collaboration tools like Slack and Microsoft Teams.

Benefits of Using IrisAgent in Customer Support

IrisAgent's features translate into tangible advantages for customer support teams. With response accuracy exceeding 90%, it helps reduce resolution times and minimizes escalations to higher support levels. The platform's real-time sentiment analysis gives agents immediate insight into customer emotions, allowing them to prioritize cases involving frustrated customers. By automating repetitive tasks, IrisAgent frees up support teams to focus on more complex and meaningful interactions.

Getting Started with IrisAgent

To get started, connect IrisAgent to your primary ticketing system - whether it's Zendesk, Salesforce, Freshworks, Intercom, or Jira - so the AI can begin learning from your historical data. You can install it through the Zendesk Marketplace, though subscription fees may apply for full access to its features.

Deploy IrisGPT on self-service channels and set up workflows for tagging, routing, and deflecting requests based on detected intent. You can also integrate your knowledge base articles via API or direct connection to further train the AI's intent detection capabilities. Enabling additional features like sentiment analysis and AI-recommended macros can further support your agents.

IrisAgent boasts a perfect 5.0/5 star rating on the Zendesk Marketplace, reflecting its user-friendly setup and the efficiency improvements it brings.

Conclusion

Steps to Build Chatbots with Intent Detection

Creating a chatbot with strong intent detection involves a series of clear steps. Start by reviewing your existing support data, such as tickets, chat logs, and FAQs, to pinpoint the most frequent customer requests. Use this data to define specific intents like "Order Status" or "Reset Password", ensuring each intent focuses on a single task. Organize these intents in a way that minimizes overlap or confusion.

Next, train your model with a variety of sample phrases. Begin with 20–30 high-quality examples per intent, then expand to 80–100 for production-level performance. Set a confidence threshold of around 0.7 to strike a balance between accuracy and flexibility. Additionally, include fallback strategies and human handoffs to handle queries that the AI cannot confidently resolve.

If coding isn't your team's strong suit, tools like IrisAgent can simplify the process. It integrates automated ticket analysis and built-in intent detection directly into your existing support systems, making it easier to get started.

These foundational steps help ensure your chatbot not only responds quickly but also delivers a better overall customer experience.

How AI Improves Customer Support

AI is transforming customer support by automating tasks like routing and reducing response times. For instance, when a chatbot can accurately distinguish between requests like "Billing Help" and "Technical Support", it triggers the right workflow - whether that’s pulling data from an API, offering a relevant knowledge base article, or connecting the customer to a specialized agent .

"A well-designed intent framework is the single most important factor for AI chatbot performance".

AI also boosts support quality by enabling more personalized interactions. Modern systems can handle repetitive questions automatically, allowing human agents to focus on complex cases that require empathy and creative problem-solving. This division of labor ensures customers get efficient, yet thoughtful, support.

Build a Full-Stack AI Chatbot with Intent Detection | n8n + Azure OpenAI + Node.js + React

openai chatbot with intent detection

FAQs

What are the best ways to improve my chatbot's intent detection accuracy?

To improve your chatbot's ability to accurately detect user intent, start by gathering a wide range of real user inputs. This helps the model recognize the different ways people might phrase the same intent. When defining intents, make sure they have clear and distinct boundaries to avoid confusion or overlap. You might also find it helpful to use hierarchical structures to keep things well-organized and manageable.Another useful approach is leveraging few-shot learning techniques with large language models (LLMs). This method can enhance your chatbot's understanding of more subtle or complex user queries.Don't forget to retrain and evaluate your model regularly with updated data. This ensures your chatbot adapts to shifts in user behavior, keeping its performance sharp and relevant. By applying these strategies, you’ll create a chatbot that’s not only smarter but also more reliable for users.

What are the best practices for organizing chatbot intents?

To set up chatbot intents effectively, begin by crafting a clear and logical structure. Start by grouping related intents into broad categories such as Account Management, Order Status, or Technical Support. From there, break these down into more specific intents like reset password or track order. Stick to consistent naming conventions - like lowercase with underscores or camelCase - and steer clear of synonyms that might confuse the model.Populate each intent with a variety of real-world examples that reflect how users might actually phrase their requests. Initially, aim for around 20 sample utterances per intent, but as your chatbot evolves, expand this to 50 or more for better accuracy. Include variations in phrasing, grammar, and tone to ensure the chatbot can handle natural language patterns effectively.Make it a habit to review and refine your intents regularly. Add new utterances, merge or split intents that overlap, and remove those that are rarely used. Tools like IrisAgent can simplify this process by automating tasks like intent management, ticket tagging, and performance tracking. This ensures your chatbot remains accurate and consistently meets customer needs.

How does function calling improve chatbot performance?

Function calling allows a chatbot to perform specific tasks or access real-time information by triggering predefined APIs or scripts during a conversation. For instance, if a user asks to check an order status, create a support ticket, or look up pricing, the chatbot creates a structured "function call" containing all the necessary details. Your system processes this request, runs the required function, and sends the results back to the chatbot. This enables the bot to deliver accurate, actionable responses in real time.When integrated into an IrisAgent-powered chatbot, function calling can automate tasks such as ticket tagging, routing inquiries, or conducting sentiment analysis. This eliminates the need for manual intervention, making the chatbot more efficient and dependable. It’s especially useful for managing real-world support scenarios, as it ensures quick answers and immediate actions, ultimately improving the customer experience.

Continue Reading
Contact UsContact Us
Loading...

© Copyright Iris Agent Inc.All Rights Reserved