Data Center Lab Lead

Website Facebook

Facebook's Release to Production (RTP) Team

The Release to Production (RTP) team at Facebook designs and builds the hardware used in data centers, making sure it is reliable, performant, secure, and provides the most value to the overall infrastructure. The team's primary goal is to find, debug, and remediate failure modes for hardware before certifying it for mass production use in data centers around the world. The RTP team is always looking for innovators with various background across hardware, firmware, software and system to join our diverse, collaborative environment.Watch the video to learn more about the team, and visit www.facebook.com/careers to apply!

Posted by Facebook Careers on Thursday, 27 December 2018

Facebook’s mission is to give people the power to build community and bring the world closer together. Through our family of apps and services, we’re building a different kind of company that connects billions of people around the world, gives them ways to share what matters most to them, and helps bring people closer together. Whether we’re creating new products or helping a small business expand its reach, people at Facebook are builders at heart. Our global teams are constantly iterating, solving problems, and working together to empower people around the world to build community and connect in meaningful ways. Together, we can help people build stronger communities — we’re just getting started.

Facebook is seeking a forward thinking, experienced Release To Production Lab Lead to join the Hardware Release to Production team. This position is full-time and will be based in Forest City, NC.

We seek a Release To Production Lab Lead with advanced hands-on technical skills in Server Hardware, Linux, and Networking, ideally in a data center environment. Having depth and breadth of knowledge of managing servers in a large-scale distributed environment is a core competency of this individual. The candidate should also have deep knowledge and experience in one of the following core areas: Networking, Project Management, Tooling and Automation, Hardware, Systems Administration, Validation, or Data Center Operations.

RESPONSIBILITIES

  • Perform general troubleshooting and repairs on Linux-based data center hardware products.
  • Work with hardware design, validation teams, and vendors to test and deploy new server, storage, and networking products in the data center infrastructure.
  • Test and troubleshoot new hardware products and components.
  • Provision, decommission, and manage hardware test racks in a production data center environment.
  • Identify, characterize, and root cause hardware failures and error conditions.
  • Assist hardware engineers by running experiments, collecting data, and providing feedback on failure symptoms for lab and production servers.
  • Provide cross-functional communication with other technical operations group.
  • Provide serviceability feedback on new hardware and coordinate road shows of early hardware for Site Operations teams.
  • Serve as the Site Operations team’s local point of contact and subject matter expert regarding hardware.
  • Maintain an efficient, orderly hardware test lab operation within the production data center.

MINIMUM QUALIFICATIONS

  • BS or BA in technical field or commensurate experience
  • 6+ years of experience with Linux and hardware systems support in an Internet operations environment
  • Experience working with Linux (Red Hat/CentOS, SUSE, Ubuntu, Debian, Gentoo), or Unix (Solaris, FreeBSD, OSX)
  • Experience supervising, training, mentoring, and leading other technicians
  • Knowledge of out-of-band/lights-out server communication methods, such as IPMI and serial console
  • Communication experience
  • Project management experience

PREFERRED QUALIFICATIONS

  • Bash, PHP, Python, or Perl scripting experience

To apply for this job please visit www.facebook.com.