facebook infrastructure

Facebook Open Compute & Other Infrastructure

The discussion is preferred to be started with the beginning when there was only a simple idea and through that it will be driven into that how Facebook has tackle the challenge of facing the infrastructure, how engineering culture helps to cope up these challenges and even how can you even open your own business, if you consider yourself really inn to accelerate the adoption of technology.

Looking the condition few years back in 2007, Facebook has only 40 million users that were considered as very crazy as in 2004, Facebook had new apps, in 2006, Facebook loaded with News Feed and in 2007, and Facebook came up with Platform launch. Here in 2007, it was tried to keep the users happy and targets were achieved very successfully. But the influence that happens in next four years and so, there was a huge increment in users at Facebook so similarly, there is a piece of accelerating of new features at Facebook. In 2011, Facebook has 500 Million users and it is interesting to deal with the growth of business and expansion in a number of customers approaching to it and Facebook is found really concerned about it.

Facebook also had a humble beginning as most of the people had as it was started with a shared hosting of server at $ 85 in a month. In the beginning, it was a pretty simple working process that is the web that was connected with the back end data base cluster. One of the first thing that engineers realize is that cyber Facebook can never work without just bad caching so that the feature including in it was the Memcache. Later on, it was further realized by the engineers and the system passed through another process of changing that was the division of front end cluster into the service cluster, consisted of web and Memcache, and the back end cluster, consisted of Ads, Multi-feed and other small services such as people you may know, friend suggested and so on. However, the purpose of service cluster is to host other large services such as a platform for messages as today we use Facebook as the major way of passing messages.

It took almost two years to build through the capacity planning of Facebook as it was the set strategy to do more and to do better. As a result, it was cleared with the performance reviews and the number of users signing up daily at Facebook. However, growth has obliged to think in a different manner and lots of steps were taken but two of the prominent steps are going to be discussed here. At first, it was started to be focused enough on efficiency and have a much bigger hammer to break our capacity limit.

In order to accomplish the first target, it was all about the software and an example of it can be discussed here that was the project of replacing the PHP runtime as applications require PHP and it consumes lots of runtime and lots of memory. For this, Facebook went through the real hard work to achieve the target and then found three software engineers. The first was Ben Mathews, who worked on low risk ad low reward on PHP Serve. Second was Steven Grimm, who worked on Quercus with medium risk and rewards. The last was Haiping Zhao who came with the approach of HPHP or Hip Hop HPHP. However, the project of HPHP has been considered because Facebook was working on the maximization of efficiency. Web resulted as 6X more efficient with equal traffic, API 30% less CPU with 2X traffic and Cost benefit was tens of millions annually. The launch of HPHP was earlier in 2009 and Facebook was really at their breaking point of physical infrastructure.

So it took a lot of time to think about how software works with servers and how servers work with data center and as a result, Facebook reached to the decision point that we had to something radically different and then a lot of time was spent in thinking about what radical was and how exactly radical used to be.

Unfortunately, Facebook could not find any open platform that could be leveraged and programs and data centers going to be destructive and it was surely going to be. So a challenge was given to the Engineers that Grid to Gates. It covered everything that comes in the way of Power Grid to the mother board or CPU such as Data Center and Server. At first, it was supposed to have a product in 6 months but it was long and a week of delays with a couple of months and in the end, the product was in hand after two years. It is really difficult to align with a process and enhance efficiency in case, if your product tends to moderate every month and every week and in such cases, it is the history that none of the quantitative analysis can be made and all was on the judgment.

The transformation had made Facebook to build the hardware and unfortunately Facebook was a software company, not a hardware company. Facebook had only a few people, did not have any labs based, did not have any equipment and Facebook did not even have any supply relationship so all were wondered that how to begin. Facebook designed a philosophy at that time to be simple and focus on what really matters to server. To start, a lab was essential and Facebook came up with a single room lab in the beginning which was not even well modified but out of that lab, there were lots of equipments developed such as electrical and mechanical Data Center, server chassis, triplet rack, Intel Motherboard, power supply, battery cabinet and AMD motherboard. These additions created a really big difference in working with Facebook. Still there were some questions to be answered such as the server size, as the bigger sizes of servers come up with bigger fans to provide proper exhaust but it makes the system less efficient.

In the end, a model diagram of server was found in general, in which you can find the drive cage of the server at the upper left corner and fans placed at the bottom left corner. However, AMD/Intel Motherboards were placed at the right bottom corner along 450W Power Supply at the top right corner of the server. However, the next thing Facebook was thinking about the problem that was how the power should be distributed among the servers. The team of Facebook came up with the idea of building the battery cabinet, it may not sound very pretty but, it was distributed through UPS.

That time, Facebook was facing the typical data center power conversion system that was giving up the loss up to 17% and it was not shown as bearable to the company. However, it was also adapted as a challenge and decided to move towards an optimized data center power conversion system. The typical data center power conversion system were having the involvement of AC/DC and DC/AC along ASTS/PDU at 3% loss and then to the Server PS. This whole process costs up to server as 17% loss. However, the optimized data center power conversion system had utility transformer and then directly interference with FB Server PS with stand by 48VDC DC UPS so that by this the loss was 80% minimized as up to 2%. This process was done by considering the improvement in efficiency and reduction in cost.

Well, looking back to this project for last couple of years, Facebook placed two big bets. The first was rethinking or redesigning server power supplies as the traditional power scheme cost $ 2 per Volt but after the customization, it becomes 20 cent at same quantity and it was a big cost saving. The second was removing console from motherboard and the reason Facebook did this, was to save the extra cost as few bucks per server can be saved and the total figure would go in abundant.

This was the idea behind the Facebook that looks very charming, useful, profitable and connective today. There was a huge effort behind it and that makes it more valuable. It was really interesting for the team to face the difficulties of the initial stage and pass by all those tastes that has given the Facebook today’s look. Some of the problems were really difficult to be tackle but the dedication and hardworking never left any question behind and at the end of the day, Facebook had the ever best solution. Facebook has a complete focus on simplicity and efficiency, however at some stages, cost reduction was also one of the tools to decide the future path but it was excessive cost rather than vital expense list. Everything that is done by effort, there is no question that ends with result and if efforts were in right direction, the results were ultimately good enough.