4 nov 2024 anni - CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments

Descrizione:

Customer Relationship Management (CRM)
systems are vital for modern enterprises,
providing a foundation for managing customer
interactions and data. Integrating AI agents
into CRM systems can automate routine
processes and enhance personalized service.
However, deploying and evaluating these
agents is challenging due to the lack of realistic
benchmarks that reflect the complexity of
real-world CRM tasks. To address this issue,
we introduce CRMArena, a novel benchmark
designed to evaluate AI agents on realistic tasks
grounded on professional work environments.
We worked with CRM experts to design nine
customer service tasks distributed across three
personas: service agent, analyst, and manager.
We synthesize a large-scale simulated organization, populating 16 commonly-used industrial
objects (e.g., account, order, knowledge
article, case) with high interconnectivity,
and uploading it into a real Salesforce CRM
organization. UI and API access to the CRM
is provided to systems that attempt to complete
the tasks in CRMArena. Experimental results
reveal that state-of-the-art LLM agents succeed
in less than 40% of the tasks with ReAct
prompting, and less than 55% even when
provided manually-crafted function-calling
tools. Our findings highlight the need for
enhanced agent capabilities in function-calling
and rule-following to be deployed in real-world
work environment. CRMArena is an open
challenge to the community: systems that
can reliably complete tasks showcase direct
business value in a popular work environment.

https://arxiv.org/pdf/2411.02305?

Aggiunto al nastro di tempo:

Artigos

Bycecilia

5 mesi fa

Data:

4 nov 2024 anni

Adesso

~ 7 months ago