GPU Server Cluster Operations & Architecture Engineer

Responsibilities

1.Design, deploy, maintain, and optimize GPU server clusters to support image algorithm training and AI workloads.

2.Manage the allocation and utilization of hardware resources such as GPU, CPU, memory, and storage to meet multi-department R&D requirements.

3.Optimize system performance and job scheduling strategies to improve resource utilization (e.g., using SLURM, Kubernetes, Docker, etc.).

4.Establish system monitoring, data security, automated backup, and disaster recovery mechanisms to ensure service stability.

5.Collaborate closely with algorithm teams to understand model training needs and provide customized infrastructure solutions.

6.Prepare system documentation and operational guidelines, and provide technical training and support to users.

7.Troubleshoot hardware and software issues related to servers, networks, storage, etc., and liaise with external vendors when necessary.

Qualifications:

1.Educational Background: Bachelor’s degree or higher in Computer Science, Information Systems, or a related field.

2.Work Experience: At least 3 years of experience in GPU server or AI infrastructure management, with preference given to candidates with AI training platform support experience.

3.Technical Skills: Familiarity with Linux system administration, server network configuration, and infrastructure automation deployment.

4.Proficiency in GPU cluster deployment and management tools (e.g., NVIDIA Docker, CUDA, SLURM, Kubernetes, etc.).

5.Knowledge of storage systems (e.g., RAID, NFS) and system monitoring tools (e.g., Prometheus, Grafana, etc.).

6.Understanding of mainstream deep learning frameworks (e.g., PyTorch, TensorFlow) and their dependencies on training environments.

7.Strong communication, coordination, and documentation skills, with the ability to collaborate effectively with algorithm, hardware, and software teams.

/

GPU Server Cluster Operations & Architecture Engineer

Shenzhen, China

GPU Server Cluster Operations & Architecture Engineer

/

R&D-IT

Delivery Method

Please use the job title/number and your name as the email subject and send your resume to our recruitment email address.

Headquarters: 0086-0755-26911746

粤公网安：44030002007635

Language: English

Business Cooperation: business@alpsentek.com

Submit Resume: talent@alpsentek.com

Media Partnership: pr@alpsentek.com

Headquarters: 0086-0755-26911746

2019-2025 © 锐思智芯 粤ICP备2023014043号

粤公网安：44030002007635

Popular Search Terms

Chip

Image

APP

Portable Imaging Devices

Quick Obstacle Avoidance

最近搜索Recent Searches

/

WeChat Search

AlpsenTek

2019-2025 © 锐思智芯粤ICP备2023014043号