Staff AI Infrastructure Site Reliability Engineer

15000 CNY~20000 CNY/Per month

Full-time
5~10 years
Refresh at 5 hours ago
170 Views
35 Apply
Shenzhen
Share
Job responsibilities
Architect and lead the development of scalable, secure AI infrastructure on cloud-native platforms to support autonomous driving technologies Collaborate closely with ML teams to facilitate seamless integration and optimal performance of AI algorithms Identify and address system bottlenecks and instabilities, applying innovative solutions to enhance system reliability and efficiency Foster technological advancements through research and implementation of state-of-the-art AI tools and methodologies Act as a key technical leader and mentor, promoting a culture of technical excellence and collaborative innovation within the AI infrastructure team
Job requirements
Minimum Skill Requirements: Bachelor's or Master's in Computer Science, Engineering, or related technical field 5 years + of experience in in designing, deploying, and managing GPU clusters for high-performance computing in AI applications, particularly within cloud environments Proficient in cloud services (AWS, Azure, ALI Cloud) and building containerized applications using Kubernetes and Docker Strong programming skills in Python, Golang, and experience with AI/ML frameworks (TensorFlow, PyTorch) Preferred Skill Requirements: Expertise in designing and managing high-availability, high-throughput systems that support machine learning and deep learning workloads Demonstrable leadership skills with a track record of mentoring and leading technical teams In-depth understanding of data structures, algorithms, and software engineering principles relevant to AI and autonomous systems
Search for your dream job
Job category
City or country
Company info

Latest blogs

Jobs
Candidates
Blog
Me