33
/pt/
AIzaSyAYiBZKx7MnpbEhh9jyipgxe19OcubqV5w
August 1, 2025
Public Timelines
Menu
Public Timelines
FAQ
Public Timelines
FAQ
For education
For educational institutions
For teachers
For students
Cabinet
For educational institutions
For teachers
For students
Open cabinet
Criar
Close
Create a timeline
Public timelines
Library
FAQ
Baixar
Export
Criar uma cópia
Premium
Incorporar ao site
Share
Artigos
Category:
Ciência
Atualizado:
5 meses atrás
0
0
58
Autores
Created by
cecilia
Attachments
Comments
Eventos
τ -bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
WINDOWS AGENT ARENA: EVALUATING MULTI-MODAL OS AGENTS AT SCALE. citado:9
The Agent Company citado:1
METATOOL BENCHMARK FOR LARGE LANGUAGE MODELS: DECIDING WHETHER TO USE TOOLS AND WHICH TO USE. citado: 63
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents. citado: 11
OSWORLD: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments citado:56
OPENHANDS: AN OPEN PLATFORM FOR AI SOFTWARE DEVELOPERS AS GENERALIST AGENTS
ASSISTANTBENCH: Can Web Agents Solve Realistic and Time-Consuming Tasks?. citado: 2
TravelPlanner: A Benchmark for Real-World Planning with Language Agents. citado: 70
HAICOSYSTEM : AN ECOSYSTEM FOR SANDBOXING SAFETY RISKS IN HUMAN-AI INTERACTIONS. citado: 1
The BrowserGym Ecosystem for Web Agent Research. citado: 0
TIMEARENA: Shaping Efficient Multitasking Language Agents in a Time-Aware Simulation
AGENTHARM: A BENCHMARK FOR MEASURING HARMFULNESS OF LLM AGENTS
The Code That Binds Us: Navigating the Appropriateness of Human-AI Assistant Relationships
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting
BLADE: Benchmarking Language Model Agents for Data-Driven Science
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
About & Feedback
Acordo
Privacidade
Biblioteca
FAQ
Support 24/7
Cabinet
Get premium
Donate
The service accepts bank transfer (ACH, Wire) or cards (Visa, MasterCard, etc). Processed by Stripe.
Secured with SSL
Comments