Nowadays Big Data is everywhere. Many are talking about it and they are enthusiastic to deploy a Big Data instance in their environments. Installation and deployment can be difficult though. The fact is that there is no official mature Big Data standard and lots of open source standards are being developed, sometimes independently. Even if we accept Apache Hadoop as the dominant standard of Big Data, implementing Hadoop is a big challenge for IT departments. For example, according to this article: In addition to the technical challenges of deploying large-scale Hadoop systems and applications, another issue Manor cited is that IT operations often work in silos, with separate teams handling systems administration, database administration, storage, networking, security and application development. That approach can lead to problems in managing Hadoop clusters.
And it’s exactly where Virtualization, Cloud and SDN can help: integrating multiple administration tasks in a unified control center. And VMware did this beautifully by putting together all required Hadoop components in a package to create Clusters and control and scale the Hadoop Clusters by using VMware vSphere Big Data Extensions. Hadoop clusters which are created by vSphere Big Data Extensions are scalable, elastic and flexible. You can easily separate compute and data nodes or increase the number of working machines and so on. vSphere Big Data Extensions utilizes the open source project Serengeti that was initiated by VMware to implement Hadoop on a virtual platform. Serengeti or better to say VMware vSphere Big Data Extensions deploys HDFS, MapReduce, Pig, Hive and HBase on vSphere infrastructure.
You can find general installation instructions here, but there are some implementation tips which will help in vSphere Big Data Extensions installation. In my upcoming posts I will show the required steps and important considerations during installation.