Skip to main content

Command Palette

Search for a command to run...

👋 Everything about EKS & AI Infrastructure Newsletter "#64" ☁️❤👨‍💻

The Managed Layer Keeps Getting Lower

Updated
17 min read
👋 Everything about EKS & AI Infrastructure Newsletter "#64" ☁️❤👨‍💻
A

I’m a Solution Architect at Lauren, AWS UG Vadodara Co-Organizer and HashiCorp Ambassador

Dear EKS & AI Infrastructure enthusiasts,
Welcome to Everything about EKS & AI Infrastructure #64.

Big week across the board. S3 Files landed — you can now mount an S3 bucket as a native file system directly from EKS, ECS, EC2, and Lambda. AWS Interconnect went GA with direct Layer 3 connectivity to Google Cloud, cutting weeks of multicloud networking work down to minutes. Karpenter v1.11.0 shipped node count limits, which is the blast radius control multi-tenant clusters have been missing. And on the AI infra side: Claude Opus 4.7 on Bedrock, the AWS Agent Registry for governed agent discovery, and a solid production walkthrough for running MCP servers on ECS.

Strong community content this week too — EKS Auto Mode networking deep dive, the SNAT flag that quietly changes your egress behavior, OpenClaw deployment patterns for enterprise scale, and two sessions at AWS Summit Bengaluru worth showing up for.

And something a little different at the end. Not about infrastructure.

Let’s get into it. 👇

  1. Performance Engineering in Modern AI Systems 🌩️

🌩️ Karpenter v1.11.0 — NodePool node count limits and cloud provider registration hooks

Karpenter’s existing NodePool limits were resource-based — CPU and memory caps only. A misconfigured workload with no node-count ceiling could provision hundreds of nodes before anyone noticed. v1.11.0 adds a hard node count limit per NodePool, which changes the blast radius calculus in multi-tenant clusters. Pair a dedicated NodePool per team or namespace with a node count limit and you get budget boundaries baked into cluster topology rather than enforced by external policy or post-incident cost review. In environments where fair allocation across teams is a requirement, this is the primitive that makes it enforceable at the scheduler level rather than operationally.

The Cloud Provider Node Registration Hooks feature is less immediately visible for AWS users but matters for the Karpenter ecosystem long-term. Karpenter’s 2023 restructure split the core scheduling engine (kubernetes-sigs/karpenter) from cloud-specific logic (aws/karpenter-provider-aws) — hooks let providers customize the node registration flow without patching around core logic. Both repos shipped v1.11.0 simultaneously, and both links matter: node count limits live in core and apply to any cloud running Karpenter, while the AWS provider release carries EC2NodeClass and instance selection updates alongside it.

  1. Starred Content ⭐

Amazon S3 Files: Mount your S3 bucket as a file system on EKS, ECS, EC2, and Lambda

The decade-old friction between object storage and file-based workloads — copy data out, process it, sync it back — is the tax that S3 Files is designed to eliminate. You can now access any general-purpose S3 bucket as a native file system from EC2 instances, ECS or EKS containers, and Lambda functions, with full NFS v4.1+ semantics: create, read, update, delete, file locking, and POSIX permissions — no code changes to existing applications. The implementation uses a “stage and commit” model borrowed from version control: changes accumulate on the file system side and are pushed back to S3 as whole objects, preserving the guarantees existing S3 applications depend on.

The operational sharp edges are worth knowing before you build on this. S3 versioning is mandatory, there’s no IaC support at launch, and the IAM setup uses EFS service principals with S3 Files-specific conditions — not obvious. Under the hood, S3 Files runs on EFS infrastructure and delivers roughly 1ms latencies for active data, with intelligent prefetching for sequential reads served directly from S3 to maximize throughput. One gotcha: NFS file locks work correctly between processes using the file system, but operations arriving via the S3 API bypass those locks entirely — workloads that write through both paths concurrently need careful design. For EKS teams running ML training, agentic pipelines, or media processing, this is worth a controlled test before committing the data access layer.

EKS Networking Deep Dive: How NATing Works with Public and Private Worker Nodes

By default, AWS_VPC_K8S_CNI_EXTERNALSNAT=false means the VPC CNI translates a pod’s source IP to the node’s primary ENI IP before traffic leaves the instance — instance-level SNAT. Flip it to true and that translation moves to the NAT Gateway instead, so egress traffic hits external destinations with the NAT GW’s IP. The difference matters immediately when you’re looking at access logs from an external service and trying to trace which pod or node originated a request — with instance-level SNAT you see the node IP, with NAT GW SNAT you see the gateway IP and lose node-level attribution entirely.

Vinayak Pandey walks through this with a live cluster and a peered EC2 web server — public nodes, private nodes, both SNAT modes — showing exactly what appears in the access logs at each step. The private node + EXTERNALSNAT=true combination is what most production clusters land on, and it’s where the source IP visibility tradeoff bites hardest. Worth running through the experiment yourself before your next cluster networking design review.

Navigating enterprise networking challenges with Amazon EKS Auto Mode

EKS Auto Mode’s networking stack is opinionated by design — VPC CNI managed and auto-upgraded, prefix delegation on by default, load balancer controller built in, DNS caching at the node. The post by Sai Charan Teja Gopaluni and Hari Charan Ayada maps exactly how each of those decisions plays out in enterprise scenarios. The prefix delegation default is the one with the most immediate impact on cluster sizing: a c5.4xlarge supports 110 pods with prefix delegation vs. 58 in secondary IP mode — a near-doubling of density that directly affects node count and compute cost at scale. When subnet fragmentation blocks prefix assignment, Auto Mode falls back to secondary IP mode automatically and recalculates max pods per node.

Two networking behaviors worth internalizing before your next cluster design. First, SNAT policy: the default is Random (pod IP translated to the node’s primary ENI IP), which works cleanly with NAT Gateway egress — but if on-premises firewalls need per-pod traceability for compliance, you set snatPolicy: Disabled on the NodeClass, not via a CNI environment variable. Second, network policy hierarchy: Admin Tier policies set by platform teams can’t be overridden by developers, while Baseline Tier policies establish defaults that namespace-level NetworkPolicies can override — giving security teams org-wide enforcement with flexibility for app teams within those bounds. DNS Network Policies (FQDN-based outbound filtering at L7) are available in Auto Mode and solve the classic problem of egress control to SaaS endpoints where IPs rotate constantly. Pod subnet isolation via podSubnetSelectorTerms and podSecurityGroupSelectorTerms on NodeClass lets pods run in different subnets than their nodes — the pattern FSI and healthcare teams reach for when network segmentation is a compliance requirement.

Deploying MCP Servers on Amazon ECS — a production architecture walkthrough

Lambda handles lightweight stateless MCP endpoints fine, but the moment your MCP server needs warm caches, persistent streaming connections, native library dependencies, or strict VPC placement alongside private data stores — ECS on Fargate is the right runtime. Sudheer Manubolu, Piyush Mattoo, and Stacey Hou walk through a three-tier setup: a Gradio UI, a Strands Agent backed by Bedrock Nova 2 Lite, and a FastMCP server, all communicating over ECS Service Connect with an Envoy sidecar handling discovery and routing. The MCP Server runs in stateless Streamable HTTP mode — each tool call is a self-contained request — which lets it scale horizontally without session affinity. Stateful mode with Mcp-Session-Id is available when you need server-side context across multi-step workflows.

The security architecture is the part worth studying carefully. MCP uses OAuth 2.1 for auth, but AWS services speak SigV4 — the ECS MCP server bridges that gap with a defense-in-depth model: least-privilege IAM mirroring calling user permissions, CloudTrail for every InvokeModel and GetObject call, and sandboxed Fargate execution with minimal task role permissions. For observability, X-Ray as a sidecar traces requests end-to-end across all three tiers, and ECS Service Connect automatically publishes Envoy metrics — ActiveConnectionCount, TargetResponseTime, ProcessedBytes — to CloudWatch without any instrumentation changes. Sample repo is on GitHub and CloudFormation-deployable in a single stack.

Deploy OpenClaw on AWS: Lightsail vs EC2 vs AgentCore vs EKS — a deployment decision guide

OpenClaw is the fastest-growing open-source autonomous AI agent — connects to WhatsApp, Telegram, and Discord, browses the web, runs commands, manages files, all without step-by-step human direction. As it moves into enterprise contexts, the deployment decision matters as much as the tool itself. This post maps four options cleanly: Lightsail for individual developers (pre-configured, minutes to deploy), EC2 for teams needing deeper AWS integration and control, Bedrock AgentCore for serverless variable workloads, and EKS for multi-tenant enterprise scale.

The EKS pattern is where it gets interesting for platform teams. Multi-tenancy is framed as the defining challenge — and the recommended approach is Kata Containers with VM-level isolation, so each tenant’s OpenClaw workload runs in a Firecracker microVM rather than a shared container namespace. This delivers hardware-level isolation without giving up Kubernetes orchestration, and has been validated in FSI and healthcare production environments. The OpenClaw Kubernetes Operator sits on top as the specialized lifecycle management layer — handling deployment, security, observability, and agent instance management across hundreds or thousands of users from a single cluster. If your org is evaluating agentic AI infrastructure, this is the architecture reference to bookmark.

  1. Announcements 📢

📢Amazon Bedrock now offers Claude Mythos Preview — a cybersecurity-focused model class on gated access

Claude Mythos Preview, available now on Amazon Bedrock under a gated research preview (Project Glasswing), isn’t a general-purpose model rollout — it’s a deliberately scoped release targeting organizations whose software touches hundreds of millions of users. The stated use case is offensive-grade vulnerability identification: the model can analyze large codebases, identify exploitable security flaws, and deliver actionable findings with less analyst guidance than prior generations. AWS and Anthropic are explicitly sequencing access to internet-critical companies and open-source maintainers first.

The access model is the operational detail worth noting for platform and security teams: availability is limited to a pre-approved allow-list, restricted to us-east-1, and onboarding is AWS account-team driven — not self-serve. If your org is building on Bedrock and has a legitimate defensive security use case, the path is through your AWS account team, not the console. Teams that aren’t on the allow-list today should watch the Project Glasswing rollout cadence closely — the framing suggests broader access follows once the initial defender cohort has had time to harden their codebases.

📢Claude Opus 4.7 is now available on Amazon Bedrock

Opus 4.7 lands with a new Bedrock inference engine underneath it — new scheduling and scaling logic that queues requests during high demand rather than rejecting them, and dynamically allocates capacity to steady-state workloads before burst traffic. For agentic coding specifically, the model scores 64.3% on SWE-bench Pro and 87.6% on SWE-bench Verified, with stronger long-horizon autonomy over a 1M token context window. High-resolution image support is new, improving accuracy on charts, dense documents, and screen UIs.

The practical flag for teams migrating from Opus 4.6: the blog explicitly notes the model may require prompting changes and harness tweaks to get the most out of it — don’t assume a drop-in swap. Adaptive thinking is supported, letting Claude dynamically allocate thinking token budgets per request based on complexity. Available now in us-east-1, Tokyo, Ireland, and Stockholm, with 10,000 RPM per account per region out of the box.

📢AWS Interconnect is now generally available — direct Layer 3 to Google Cloud, Last Mile on-prem alongside

The DIY multicloud stack — manual BGP, physical circuits, weeks of lead time — is what this replaces. Traffic stays entirely off the public internet, with MACsec encryption applied by default on physical links between AWS and Google Cloud routers at the interconnection facilities. Provisioning is three steps: pick target cloud, select destination region, choose bandwidth — AWS generates an activation key you use on the Google Cloud side to complete the connection, with routes propagating automatically in both directions.

The architecture detail that matters at scale: Interconnect plugs directly into AWS Transit Gateway and Cloud WAN, so teams with multi-VPC or multi-region deployments don’t need separate attachments per VPC — one attachment scales across the whole network topology. Starting May, one 500 Mbps local Interconnect is free per region — the right way to validate latency and throughput for your specific workload before committing to a bandwidth tier. For teams splitting inference or training across Bedrock/SageMaker and Vertex AI, or moving datasets between S3 and GCS as part of a pipeline, the old approach was a sync job and a prayer. This is the layer that makes that architecture actually production-grade.

📢AWS Agent Registry in preview — governed discovery for agents, tools, and MCP servers across your org

As agentic systems grow across an organization, the problem shifts from “can we build agents” to “do we know what agents already exist and what they can do.” AWS Agent Registry, part of Amazon Bedrock AgentCore, is a private catalog for agents, tools, skills, and MCP servers — with an approval workflow so nothing becomes discoverable without an admin sign-off, and CloudTrail audit trails on all access and administrative actions. URL-based discovery auto-pulls tool schemas and capability descriptions from live MCP server or agent endpoints, so registration doesn’t require manual documentation of capabilities.

The access model is worth noting: the registry is queryable as an MCP server directly from IDEs, so builders can find and invoke existing agents without leaving their development workflow. Semantic and keyword search means discovery by use case description, not just exact name. For platform teams starting to govern multi-agent pipelines on EKS or Bedrock, this is the layer that prevents teams from rebuilding capabilities that already exist elsewhere in the org. Available in preview in us-east-1, us-west-2, Tokyo, Sydney, and Ireland.

Community & Career 🤝

🤝AWS Summit Bengaluru 2026 — Agentic AI for migrations, two sessions worth catching (April 22–23)

Amit Kumar is presenting on AWS Transform for VMware Workloads — how agentic AI is being applied to enterprise VMware migrations at scale. Deepesh Dapola and Vijay Sriram are going a level deeper with Pine Labs’ actual modernization journey: moving from .NET Framework and SQL Server to .NET 8 and PostgreSQL using the AWS Transform AI Agent, including how they handled complex legacy “spaghetti” logic end-to-end.

Both sessions sit at the intersection of agentic AI and application modernization — a space that’s moving fast and still short on practitioner-level war stories. If you’re at the Summit, these are the sessions where you’ll get real architecture decisions, not slide-ware.

🤝Full-day technical session in London: Open-weight models on Amazon Bedrock — DeepSeek, Qwen, Kimi and more

Olawale from AWS is hosting a London session built specifically for teams making real model selection decisions — not a vendor pitch day. Engineers from the model teams themselves (DeepSeek, Qwen, Kimi) will be in the room alongside AWS, covering deployment architecture on Bedrock, prompting patterns that actually work, and performance/cost benchmarks across use cases. Direct Q&A with model providers is built into the agenda.

If you’re evaluating open-weight vs. proprietary models for inference workloads — weighing cost, latency, and behavioral predictability under load — this is the kind of session where you get answers that don’t fit in a blog post.

  1. Highlights ✨

Claude Managed Agents: From Prompt to Production in Minutes

The hardest part of shipping a production AI agent has never been the model — it’s been everything around it: sandboxed execution, checkpointing for long-running tasks, secret management, scoped permissions, and tracing that survives disconnections. Managed Agents handles all of that as a managed layer, so teams define outcomes and guardrails rather than building infrastructure. The shift is comparable to what happened when container orchestration moved from DIY to managed Kubernetes — the problem doesn’t disappear, it just moves to a layer you no longer have to own.

The operational details matter more than the headline speed claims. Pricing runs on standard Claude token rates plus $0.08 per session-hour of active runtime — idle agents don’t accrue session cost. Multi-agent coordination and self-evaluation (where Claude iterates against defined success criteria until it converges) are in research preview and require separate access requests. Both are the capabilities that make the platform genuinely interesting at scale, so teams building for those use cases should request access now rather than plan around GA timelines. The harness is Claude-only — if you later want to swap models or move off Anthropic’s cloud, you’re rebuilding the orchestration layer, not just changing an API key. For sensitive workloads, that lock-in calculus deserves an explicit architecture decision upfront.

Automating ALB Capacity Unit Reservation across AWS Organizations

ALBs auto-scale well for gradual traffic growth, but when traffic more than doubles in under five minutes — market open/close for trading platforms, flash sales, ticket drops — you need LCU reservation pre-configured, not reactive. Abhishek Dey and Sourav Bhattacharjee’s solution automates this across a multi-account AWS Organizations environment: two Lambda functions triggered by EventBridge Scheduler handle metadata collection and LCU reservation/reset across tagged ALBs, using STS cross-account role assumption with a DynamoDB table in the management account as the coordination layer. Reservation is set ahead of the spike, reset after — optimizing cost without manual touchpoints. CloudFormation templates are on GitHub and ready to deploy.

AWS Sustainability Console: Scope 1–3 reporting, programmatic access, and billing permissions decoupled

The old friction: carbon footprint data lived inside the Billing console, so sustainability teams either got billing access they shouldn’t have or went without the data. The new Sustainability console has its own IAM permissions model, decoupled from billing entirely. It surfaces Scope 1, 2, and 3 emissions broken down by Region and service, supports configurable CSV exports and fiscal-year alignment, and now exposes a programmatic API — so you can pull emissions data into your own dashboards or compliance pipelines without setting up a data export. Historical data goes back to January 2022. Free, available today.

🎉 Sponsor Section

At the moment, we don’t have a sponsor for this edition, but we look forward to working with companies and organizations that support the EKS & AI Infrastructure community in future editions. If you or your company is interested in sponsoring, please contact us at 📧 thecloudtechforall@gmail.com

📝 Words from the Author

I’ve been thinking about sunflowers lately.

It started with the Werner Vogels post — but not because of the S3 announcement. Warfield mentioned something in passing: sunflowers are more genetically diverse than humans. They’ve spent millennia borrowing traits from whatever was nearby, adapting without any real plan.

I liked that.

Most of what I’ve actually learned didn’t come from structured study. It came from being around the right people at the right time. A random conversation at a meetup. Someone’s blog post that opened a rabbit hole I didn’t expect. A problem I stumbled into that turned out to be more interesting than anything I was working on.

The certifications and courses built the base. But the good stuff came from the collisions.

That’s what I think communities are really for — not networking in the LinkedIn sense, but just being in the room with people who are figuring out different things than you are. Something rubs off. You can’t plan it.

AWS Summit Bengaluru is next week. I’ll be there. If you are too — come find me.

Happy Building! 😎