not much happened today

📝 摘要

**inference optimization** is increasingly architectural, with **eagle 3.1** improving speculative decoding and long-context handling, collaborating with **vllm** and **torchspec**. **perplexity** open-sourced a rebuilt **unigram tokenizer** cutting cpu use by **5–6×** and achieving **63 µs at 514 tokens**. **qwen3.5** hits **580 tokens/s** via joint efforts from **alibaba**, **lightseek**, **nvidia**, **mooncake**, and **flashattention-4** contributors. price cuts in apis from chinese labs are sustainable due to structural kv-cache and attention improvements, exemplified by **deepseek v4-pro** and **xiaomi mimo** reducing caching costs significantly. agent engineering shifts focus from model quality to model-harness-memory fit, with **langchain** releasing **deep agents v0.6** and tools like **langsmith engine** automating evaluation loops. **trajectory** launched a continual learning platform with **$15m funding** and partners like **clay** and **harvey**, supporting large models including a **397b-parameter model** deployed on autoscaled **h100** infrastructure. open-source memory-centric agents and minimal training harnesses also gained attention.

✍️ 编辑摘要

这条资讯的核心议题是“not much happened today”。

从当前聚合摘要看，最值得先关注的是：**inference optimization** is increasingly architectural, with **eagle 3.1** improving speculative decoding and long-context handling, collaborating with **vllm** and **torchspec**. **perplexity** open-sourced a rebuilt **unigram tokenizer** cutting cpu use by **5–6×** and achieving **63 µs at 514 tokens**. **qwen3.5** hits **580 tokens/s** via joint efforts from **alibaba**, **lightseek**, **nvidia**, **mooncake**, and **flashattention-4** contributors. price cuts in apis from chinese labs are sustainable due to structural kv-cache and attention improvements, exemplified by **deepseek v4-pro** and **xiaomi mimo** reducing caching costs significantly. agent engineering shifts focus from model quality to model-harness-memory fit, with **langchain** releasing **deep agents v0.6** and tools like **langsmith engine** automating evaluation loops. **trajectory** launched a continual learning platform with **$15m funding** and partners like **clay** and **harvey**, supporting large models including a **397b-parameter model** deployed on autoscaled **h100** infrastructure. open-source memory-centric agents and minimal training harnesses also gained attention.。

如果你只看一遍，这条新闻与后续判断最相关的点是：涉及模型：eagle-3.1、unigram-tokenizer、qwen-3.5，适合跟踪模型能力、价格或产品策略变化。

📌 关键信息

**inference optimization** is increasingly architectural, with **eagle 3.1** improving speculative decoding and long-context handling, collaborating with **vllm** and **torchspec**. **perplexity** open-sourced a rebuilt **unigram tokenizer** cutting cpu use by **5–6×** and achieving **63 µs at 514 tokens**. **qwen3.5** hits **580 tokens/s** via joint efforts from **alibaba**, **lightseek**, **nvidia**, **mooncake**, and **flashattention-4** contributors. price cuts in apis from chinese labs are sustainable due to structural kv-cache and attention improvements, exemplified by **deepseek v4-pro** and **xiaomi mimo** reducing caching costs significantly. agent engineering shifts focus from model quality to model-harness-memory fit, with **langchain** releasing **deep agents v0.6** and tools like **langsmith engine** automating evaluation loops. **trajectory** launched a continual learning platform with **$15m funding** and partners like **clay** and **harvey**, supporting large models including a **397b-parameter model** deployed on autoscaled **h100** infrastructure. open-source memory-centric agents and minimal training harnesses also gained attention.

🧭 为什么值得关注

涉及模型：eagle-3.1、unigram-tokenizer、qwen-3.5，适合跟踪模型能力、价格或产品策略变化。
涉及公司：eaglecorp、vllm_project、perplexity_ai，这通常意味着行业竞争、合作或商业化动作值得继续观察。
关联标签：inference-optimization、long-context、speculative-decoding、tokenization，可用于继续追踪同主题后续报道。

查看首个原始来源 →

🗂 主题卡片

涉及模型

eagle-3.1 unigram-tokenizer qwen-3.5 deepseek-v4-pro mimo deep-agents-v0.6 397b-parameter-model

涉及公司

eaglecorp vllm_project perplexity_ai alibaba lightseek nvidia mooncake flashattention kimmonismus deepseek xiaomi langchain baseten trajectory clay harvey decagon mercor rogo rlm

关联标签

inference-optimization long-context speculative-decoding tokenization attention-mechanisms kv-cache cache-hierarchy agent-engineering model-harness-memory-fit continual-learning quantization autoscaling memory-centric-agents evaluation-automation

← 查看全部资讯 →

📝 摘要

✍️ 编辑摘要

📌 关键信息

🧭 为什么值得关注

🗂 主题卡片

📌 更多资讯