Azure & Google Multi-Cloud Kubernetes Platform for LLM Inference with 24/7 operations
In 2023, Mistral AI entrusted us with the design and management of their multi-cloud Kubernetes (K8s) platform dedicated to the inference of their LLM AI solutions (“Le Chat,” API).
We optimized the containerization of applications and their packaging using Helm. Deployment on the clusters is carried out via GitOps with Flux.
The K8s clusters are deployed on Azure AKS and Google GKE using Terraform and Terragrunt. To ensure high performance, they incorporate AI-specific optimizations:
- for storage and deployment of large models
- on drivers to support the latest Nvidia GPUs
- on GPU-based node autoscaling, and more
The platform also includes databases capable of handling high load peaks. It is operated using modern monitoring technologies and dashboards tailored to AI-specific needs.
Since late 2023, we have been providing 24/7 operations and maintenance (O&M) for the entire inference platform.
Environment : Microsoft Azure, Google Cloud Platform
Technologies : Kubernetes (Azure AKS, Google GKE), Docker, Helm, Flux, Databases