Job Purpose
Be a key member of the team administering KSL’s supercomputers, storage and other related infrastructure. Large-scale Linux system administration expertise, expert level configuration management, scripting, backup, monitoring and automation skills are required. This is a highly technical role requiring high performance computing (HPC) or large scale computing experience, in complex environments.
Major Accountabilities
- Provide timely and useful assistance (2nd -3rd level support) to laboratory users and to other staff members.
- Maintain high customer service standards in dealing with and responding to user issues and questions. This is typically done via KSL ticketing system (RT).
- Maintain a respectful and collaborative working environment within the laboratory.
- Design and produce effective and thorough technical documentation
- Build effective relationships with staff, faculty and students through the Core Labs.
- Plan, lead implementation and support of software installation and management for the HPC systems within the Center
- Expert understanding and support of high performance parallel filesystems (i.e. Lustre), hierarchical storage management systems (i.e. DMF) and high performance computing workload managers and schedulers (i.e. Slurm).
- Apply expert knowledge of Unix/Linux systems administration, including all aspects of management, monitoring, performance analysis, and integration in complex heterogeneous environments.
- Use configuration management tools (e.g., Puppet) to help maintain large-scale Linux clusters, supercomputers, storage systems, and infrastructure servers.
- Monitor and optimize services and performance (file system, network interconnects) using Nagios, Ganglia, etc.
- Plan and administer infrastructure servers work (file servers, monitoring, etc.).
- Investigate and resolve systems related issues; coordinate with vendors to isolate hardware problems; install firmware or software patches as necessary.
- Represent the laboratory in international conferences and campus forums, such as technical coordination meetings with other technology-focused units.
- Significantly contribute to planning of the work of a major laboratory or research projects
- Maintain expert-level knowledge in most of the laboratory systems, including either high performance computing systems administration, high performance storage administration, or high performance network administration.
- Act as a point of contact for complex problems and help junior colleagues.
- Work flexible hours as needed.
- Travel occasionally for training, presentations, conferences and collaborations.
- Other duties as required.
Person Requirements
Competencies
- Good knowledge of Linux operating system is required
- Experience with Cray/HPE systems is desirable
- Proficient documentation skills.
- Ability to support research activities in the laboratory environment.
- Understands and uses appropriate methods, tools and applications.
- Demonstrates an analytical and systematic approach to problem solving.
- Demonstrates effective communication skills in written and oral English.
- Plans, schedules and monitors own work and that of others, competently within limited deadlines and according to relevant legislation and procedures.
- Appreciates the wider field of information systems and computational research, and how own role relates to other roles and to the business of the employer, users or colleagues.
- Absorbs and applies technical information.
- Works to required standards.
- Understands and uses appropriate methods, tools and applications.
- Ability to work successfully in a highly collaborative research environment.
- Uses discretion in identifying and resolving complex problems and assignments.
- Performs a broad range of work, sometimes complex and non-routine, in a variety of environments.
- Knowledge of laboratory systems and applications.
Qualifications Experience
Bachelor of Science (or equivalent) in a relevant discipline plus 6 years of experience, OR Master of Science (or equivalent) in a relevant discipline plus 4 years of experience OR Doctor of Philosophy (or equivalent) in a relevant discipline.
الإبلاغ عن وظيفة