DevOps / MLOps / AIOps
MLOps & Model Management
LMCache: KV-Cache Acceleration Layer for LLM Inference
LMCache is an open-source KV-cache acceleration layer for LLM serving that stores and reuses transformer cache chunks across GPU, CPU, disk, and Redis. Enables 3-10x faster response times under long-context and multi-turn scenarios. Source: GitHub
AIOps & Monitoring
Deploy AI Agents on Amazon Bedrock with GitHub Actions
AWS published guidance on deploying AI agents on Amazon Bedrock AgentCore using GitHub Actions. Enables CI/CD workflows for agentic AI systems with automated deployment pipelines. Source: AWS Machine Learning Blog
AI Infrastructure & Compute
NVIDIA Publishes TTT-E2E: Test-Time Training for LLMs
NVIDIA released End-to-End Test-Time Training (TTT-E2E), allowing LLMs to keep learning during inference. Maintains constant latency at 128K tokens (2.7x faster) and 2M tokens (35x faster) on H100 GPUs. Treats context as training data. Source: NVIDIA Developer Blog
No comments:
Post a Comment