GPU Server Cluster Operations & Architecture Engineer
Responsibilities 1.Design, deploy, maintain, and optimize GPU server clusters to support image algorithm training and AI workloads. 2.Manage the allocation and utilization of hardware resources such as GPU, CPU, memory, and storage to meet multi-department R&D requirements. 3.Optimize system performance and job scheduling strategies to improve resource utilization (e.g., using SLURM, Kubernetes, Docker, etc.). 4.Establish […]