Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction to EXO and Local AI Clustering
- Overview of the EXO framework and the exo-explore ecosystem
- Comparing centralized cloud inference vs distributed local inference
- Architecture: libp2p device discovery, MLX backend, dashboard, and API layers
- Hardware requirements: Apple Silicon (M3 Ultra, M4 Pro/Max), Thunderbolt 5, shared storage
Installing EXO on macOS
- Setting up Xcode, Metal ToolChain, and macOS prerequisites
- Installing uv, Node.js, Rust nightly toolchain
- Installing the pinned macmon fork for Apple Silicon monitoring
- Cloning the repository and building the dashboard with npm
- Running EXO from source and verifying the localhost:52415 dashboard
Installing EXO on Linux
- Installing dependencies via apt or Homebrew on Linux
- Configuring uv, Node.js 18+, and Rust nightly
- Building the dashboard and running EXO in CPU-only mode
- Directory layout: XDG Base Directory paths for config, data, cache, and logs
Automatic Device Discovery and Cluster Formation
- Understanding libp2p-based auto-discovery across local networks
- Configuring custom namespaces with EXO_LIBP2P_NAMESPACE for cluster isolation
- Verifying node membership in the dashboard cluster view
- Handling discovery failures and network segmentation issues
Enabling RDMA over Thunderbolt 5
- RDMA architecture and the 99 percent latency reduction claim
- Enabling RDMA in macOS Recovery mode with rdma_ctl
- Cable requirements and port topology constraints on Mac Studio
- Matching macOS versions across all cluster nodes
- Troubleshooting RDMA discovery and DHCP configuration
Deploying Frontier Models
- Using the dashboard to load and shard DeepSeek v3.1, Qwen3-235B, and Llama family models
- Previewing instance placements with the /instance/previews API endpoint
- Creating model instances with pipeline or tensor-parallel sharding
- Configuring custom model cards from HuggingFace hub
Monitoring and Troubleshooting
- Reading EXO logs and understanding distributed tracing
- Interpreting cluster health in the dashboard cluster view
- Diagnosing worker node failures and reconnection behavior
- Using EXO_TRACING_ENABLED for performance bottleneck analysis
Cluster Maintenance and Updates
- Updating EXO binaries and dashboard rebuild procedures
- Migrating model caches and managing pre-downloaded models over NFS
- Gracefully removing nodes and rebalancing workloads
Requirements
- An understanding of networking fundamentals (IP, subnetting, firewalls)
- Experience with macOS or Linux command-line administration
- Familiarity with Python package management (pip/uv) and Node.js tooling
Audience
- System administrators
- DevOps engineers
- AI infrastructure architects responsible for on-premise LLM deployment
21 Hours
Custom Corporate Training
Training solutions designed exclusively for businesses.
- Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
- Flexible Schedule: Dates and times adapted to your team's agenda.
- Format: Online (live), In-company (at your offices), or Hybrid.
Price per private group, online live training, starting from 4800 € + VAT*
Contact us for an exact quote and to hear our latest promotions