Quite often I get the query on what is the difference between a grid and a cloud. Well, lets try and net the differences.
The grid is a 'pre-cloud' terminology. However it distinctly stands for setups by academic groups who wanted to resolve problems that required crunching of large number sets. Typical usages involve satellite image crunching, weather pattern analysis, analyzing data from nuclear experiments or other extreme physics, number crunching to resolve mathematical conjectures, etc.
The objective of setting up these grids was primarily to assemble several high performing servers and get them to work in parallel. Hence you also get to see the term High Performance Computing (HPC) being used in the context of grids. Some of the best know grids are also recognized as the world's best supercomputers.
This is where today's public clouds stand to differ. They are setup to achieve scalability as against performance that grids were setup for. Assemble enough of commoditized infrastructure to help enterprises offload work and data onto these systems. Providing platforms for High Scale Computing (HSC) and not high Performance Computing has been the driver for clouds of today.
Clouds are designed to take on problems that are in cloud parlance described as "Embarrassingly Parallel Problem"(EPP) Consider a problem that involves finding out the number of times the word "Apple" has been repeated in all of Encyclopedia Britannica's content. A problem whose solution can be approached a typical EPP fashion. Divide and distribute the problem into as many words as there are in the encyclopedia and then funnel them all into a routine that runs in parallel to check if the word matches with "Apple". Each parallel routine returns a 1 or 0. A final counter gets updated as each routine returns its results. A massively parallel approach to solve the problem in minutes that otherwise might have taken days if not months for a stand alone system to solve. All achieved by utilizing the HSC aspect of today's clouds.
A HPC system would rather have a highly complex routing running in each node and one that might also involve certain nodes to continue their crunching based on outputs of few other nodes and vice versa. HPC works on a different dimension.
No comments:
Post a Comment