Cluster Part 1….

So given the constraints of my research budget, I did not (nor expect to have) $250,000+ to buy a new HPC cluster. While Cloud Computing is also exciting, and I’ll probably discuss some musings on that in a separate post, there’s a lot of unknown costs that make it hard for me to predict. One of the biggest challenges, is that my “base” data set is approximately 20 Terabytes. When I first started looking into this 4-5 years ago, storing that amount of data was prohibitively expensive. Even now, it’s not cheap…

So I started off small, with a 16 core server with some hard drives. I used Sun Grid Engine (and now torque) to handle job execution, and have stuck with Ubuntu as my primary OS.

My first “BIG” purchase after this was a Storage NAS. I have been using QNAP devices for almost ten years, and have been a big fan. I like having at least some of my primary storage as an “appliance” . On multiple occasions I have fried my primary system during upgrades, installing software/etc.. and by having at least some of my files living in a stable spot (aka QNAP), I’ve saved myself a lot of frustration.

Once the NAS was up and running, I purchased my first batch of servers. After spending a LOT of time searching around, I made a couple of executive decisions. While my head node and NAS were purchased new, for my compute cluster I basically wanted max # of CPU’s and memory for the least amount of money. One great option, particularly for a cash-strapped lab can do some of it’s own maintenance and doesn’t need to be truly production is to look into refurbished servers.

My first batch of servers included 10 HPG6 DL160 servers. I was able to purchase each machine for about $800.

HP ProLiant DL160 G6 1U 2x XEON HEX-CORE X5650 2.66GHz with 72GB DDR3. Each machine only had an 160GB drive for the primary OS, but that’s OK because everything was connected to my NAS which is where all the files lived. This was setup just using standard 1GB connectivity, and basically was able to rack all the machines fairly quickly…

Purchasing a similarly specced machine new would have cost me around $4000-$5000 each. The only downside, is that I don’t have any warranty (other than the 90 days from the vendor). However unless I have incredibly poor luck, if any of the machines wound up dying I’d still be WAY ahead. Also these aren’t the absolutely latest generation, but for bulk parallel computation, it doesn’t matter. Also the marginal improvement from generation to generation is not a factor of 10… it’s usually several percent increases.

So basically, for an $8000 investment, I now purchased 120 cores (240 threads) of computing, and 720GB of RAM (72GB / server is quite nice).

So now I just have a lot of “stuff”…. I need to actually get them provisioned/configured…. That’s in the next post…