📝 摘要
✍️ 编辑摘要
这条资讯的核心议题是“not much happened today”。
从当前聚合摘要看,最值得先关注的是:**z.ai's glm-5.2** leads in coding and agent benchmarks with top scores like **1595** on code arena: frontend and **34.29%** reasoning accuracy with zero failures. databricks improved glm-5.2 speed to **392 tok/s** using hardware and optimizations. **ornith-1.0**, a new mit-licensed coding model family, spans **9b to 397b parameters** with strong benchmark results and a self-improving rl training method. **liquid ai** released a small model for low-latency robotics/e-commerce use. **google** integrated computer use into **gemini 3.5 flash** with safety controls and developer tools for device control. startups like **sail** and **hyperagent** focus on long-running agents with persistent execution and cost efficiency. **openai** reports growing internal codex use for complex, cross-functional tasks, highlighting agent skill concurrency.。
如果你只看一遍,这条新闻与后续判断最相关的点是:涉及模型:glm-5.2、glm-5.2-max、opus-4.8,适合跟踪模型能力、价格或产品策略变化。
📌 关键信息
- **z.ai's glm-5.2** leads in coding and agent benchmarks with top scores like **1595** on code arena: frontend and **34.29%** reasoning accuracy with zero failures. databricks improved glm-5.2 speed to **392 tok/s** using hardware and optimizations. **ornith-1.0**, a new mit-licensed coding model family, spans **9b to 397b parameters** with strong benchmark results and a self-improving rl training method. **liquid ai** released a small model for low-latency robotics/e-commerce use. **google** integrated computer use into **gemini 3.5 flash** with safety controls and developer tools for device control. startups like **sail** and **hyperagent** focus on long-running agents with persistent execution and cost efficiency. **openai** reports growing internal codex use for complex, cross-functional tasks, highlighting agent skill concurrency.
🧭 为什么值得关注
- 涉及模型:glm-5.2、glm-5.2-max、opus-4.8,适合跟踪模型能力、价格或产品策略变化。
- 涉及公司:z.ai、databricks、liquid-ai,这通常意味着行业竞争、合作或商业化动作值得继续观察。
- 关联标签:coding-benchmarks、agentic-ai、reinforcement-learning、model-optimization,可用于继续追踪同主题后续报道。