33
/it/
AIzaSyAYiBZKx7MnpbEhh9jyipgxe19OcubqV5w
August 1, 2025
Public Timelines
Menu
Public Timelines
FAQ
Public Timelines
FAQ
For education
For educational institutions
For teachers
For students
Cabinet
For educational institutions
For teachers
For students
Open cabinet
Creare
Close
Create a timeline
Public timelines
Library
FAQ
Scaricare
Export
Creare una copia
Premium
Integrare nel sito Web
Share
Artigos
Category:
Scienza
è stato aggiornato:
5 mesi fa
0
0
55
Autori
Created by
cecilia
Attachments
Comments
Eventi
τ -bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
WINDOWS AGENT ARENA: EVALUATING MULTI-MODAL OS AGENTS AT SCALE. citado:9
The Agent Company citado:1
METATOOL BENCHMARK FOR LARGE LANGUAGE MODELS: DECIDING WHETHER TO USE TOOLS AND WHICH TO USE. citado: 63
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents. citado: 11
OSWORLD: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments citado:56
OPENHANDS: AN OPEN PLATFORM FOR AI SOFTWARE DEVELOPERS AS GENERALIST AGENTS
ASSISTANTBENCH: Can Web Agents Solve Realistic and Time-Consuming Tasks?. citado: 2
TravelPlanner: A Benchmark for Real-World Planning with Language Agents. citado: 70
HAICOSYSTEM : AN ECOSYSTEM FOR SANDBOXING SAFETY RISKS IN HUMAN-AI INTERACTIONS. citado: 1
The BrowserGym Ecosystem for Web Agent Research. citado: 0
TIMEARENA: Shaping Efficient Multitasking Language Agents in a Time-Aware Simulation
AGENTHARM: A BENCHMARK FOR MEASURING HARMFULNESS OF LLM AGENTS
The Code That Binds Us: Navigating the Appropriateness of Human-AI Assistant Relationships
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting
BLADE: Benchmarking Language Model Agents for Data-Driven Science
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
About & Feedback
Accordo
Privatezza
Biblioteca
FAQ
Support 24/7
Cabinet
Get premium
Donate
The service accepts bank transfer (ACH, Wire) or cards (Visa, MasterCard, etc). Processed by Stripe.
Secured with SSL
Comments