Out with the old bugs, in with the new: Building 'Archer,' my new workstation

I’ve finally decided to deprecate “Bender,” my old desktop computer. In his heyday, Bender was actually pretty high-end. I spent days on Newegg and Tom’s Hardware, researching different setups, pricing things out, and when I found out that Dell had a “Build Your Own PC” option that included many of the things I wanted (at a solid discount), I went for it.

In all my time, I have never actually met someone who liked their Dell computer, and I should say that Bender’s predecessor was a Dell Latitude C840 laptop, which was actually kind of terrible. Granted, it was a workhorse through my college years, but it was heavy, slow, had no wi-fi, lasted 45 min to a charge, and the CDROM never worked right. But, I figured this program would give me pretty much all the control over the computer’s performance, and if I didn’t like the outcome it wouldn’t be Dell’s fault.

I ended up going with the Inspiron line, as that seemed to make the best use of my money. I wanted something that would last a long time, so I opted to spend a bit more on the processor and less on RAM. RAM typically falls in price faster than processors, and is easy to upgrade in production machines, so it made sense to cheap out up front and upgrade it later. With processors, you don’t necessarily have that option, as newer units often utilize different chipsets that require a whole new motherboard.

I also cheaped out on the optical drive, since I rarely need one for anything I do, and I opted to use Intel’s onboard video rather than get a dedicated card, since that worked best for my use case. I used the savings from these items to pick up a PCIe TV-tuner, Wireless card, 500GB 7200 RPM hard drive and a 5.1 speaker system. Bender also came with a monitor, keyboard/mouse and Windows Vista. All told, I spent around $900, which was a pretty good deal when you consider how long it’s lasted. As a work computer, I could probably be fine using Bender for another 3-4 years. $900 for a computer that dutifully performs for 10 years is a bargain.

But, it’s time to move on. I don’t want to spend those last 3-4 years worrying about system failure. I’d rather put Bender into a project (maybe home surveillance? A drone?) where system failure doesn’t cause me to lose anything important.

A detailed technical background with a brief shelf-life

I started by looking at the market to see what’s changed over the past 6 years and compare it to Bender. I came away with a number of lessons learned:

1. Modern processors now use 3 levels of cache. Basically, a processor needs to be able to access data to perform calculations, and this data is stored in memory. Retrieving data from RAM is extremely slow from the processor’s point of view, so chipmakers started adding caches to processors, which host the most-frequently accessed data. So when the processor needs some data, it checks the Level 1 (L1) cache first, then the Level 2 (L2), then Level 3 (L3) and so on, before moving to RAM. This reduces the amount of time the processor spends waiting around to get the data it needs.

Each level is physically composed of different material. L1 cache is generally the fastest, but is extremely expensive. Levels 2 and 3 are also made from different materials with varying rates of data access times. The upshot is that a given processor will perform better as the number of cache levels increases, and as the cache sizes (especially that of L1) increase. Bender only has 2 cache levels. Level 1 has 4 independent 64kb caches (divided evenly among data and instruction sets, 32kb each), and Level 2 has two 4 MB caches, with two cores sharing a given cache. That’s a total of 256kb for L1 and 8GB for L2 caches.

2. The Front Side Bus (FSB) has gone away. Historically, the FSB was an interface between the processor and the northbridge (a chip that facilitates communication between the processor, RAM and GPU). Basically as CPU speeds increased, this design caused a bottleneck between the processor and the motherboard. Interestingly, Bender was one of the last computers made with this design. Both major chipmakers, Intel and AMD, have moved to a design where the northbridge is incorporated on to the processor itself. These are called HyperTransport (AMD) and DMI/QuickPath Interconnect (Intel).

I found it difficult to find a metric that would be an apples-to-apples comparison between the FSB and HyperTransport/QPI/DMI. FSB is generally measured in MHz, while the newer designs are measured in GT/s (Giga-Transfers per second). Supposedly the modern bottleneck is bus width rather than speed. So a device with a higher bus width will perform better than a device with a lower bus width for the same FSB or GT/s. Analysts instead rely on bandwidth, which is a combination of speed and bus width, to compare the performance of differing processors.

So, for example, QPI transfers data at 4 bytes per transfer, so a QPI of 6.4GT/s translates to a bandwidth of (4B/T)(G.4 GT/s) = 25.6GB/s. Contrast that with Bender’s Q6600, which has an FSB speed of 266MHz, 4 Transfers per cycle, a bus width of 8 bytes (64bits), giving it a max bandwidth of (266M cycles/s)(4T/cycle)(8B bus width) = 8.5GB/s. So QPI can carry 3 times as much data as Bender’s FSB in the same amount of time.

One other thing I learned is that there is a lot of contention between QPI and DMI/DMI2. From what I can understand of the many garbled, half-answers online, QPI is a high speed bus which is most analogous to an FSB, in that it replaced FSB in connecting to the X58 northbridge. By contrast, DMI is a link between the northbridge and southbridge. Once Intel chips completely absorbed the northbridge, QPI essentially became irrelevant (at least from the marketing perspective). It’s allegedly still there on the CPU, acting as a gateway between the CPU and PCIe controller. For these chips, Intel lists DMI2, which is the connection between the processor and southbridge, in this case a PCHXXX chip, that controls display, system clock, peripherals, etc.

3. CAS Latency affects RAM speed in an appreciable way. RAM is rated to a given clock cycle, and the speed is often expressed in terms of MB/s or MT/s. So for example, DDR3-1600 indicates the RAM is rated to 1600 MT/s, which on a 64bit bus width translates to a maximum bandwidth of 12,800 MB/s (or 12.8 GB/s). Of course, that assumes a steady stream of data, which isn’t exactly how RAM works. CAS Latency (CL) is the number of clock cycles, between when data is requested from the RAM and the time that data becomes available for use. The longer the delay, the less actual bandwidth is being used.

What’s interesting about that is the relationship between CL and data rate is such that in some cases, more bandwidth is available from RAM at a slower data rate than through RAM at a higher data rate, when the difference in CL is large enough. Take, for example, a stick of DDR3-1600 with a CL of 9. This gives a data rate of 1600 MT/s, a command rate of (1/2)(1600 MT/s) = 800MHz, a bit time of 1/(1600MT/s)=0.625ns, and a cycle time of 2x0.625ns = 1.25ns. Taking into account just CL (not the other associated timings), the time required to deliver the first “word” is going to be CL x cycle time, or 9x1.25=11.3ns. It then takes another 3 times the bit time (3x0.625ns) to complete the fourth word, and an additional 4 times the bit time (4x0.625ns) to complete the eighth word. This gives a total time of 15.6ns to clear the cache line.

That said, the performance of different memory sticks can be compared more directly. A DDR3-1600 with a CL of 7 will clear the cache line in 13.1ns, while a DDR3-2000 with a CL of 10 will take 13.5ns. In that comparison, the slower 1600MT/s stick outperforms the 2000MT/s stick. Of course, the DDR3-2000 offers roughly 25% more bandwidth than the DDR3-1600, which means it will outperform in cases where the address of the data to be read is known long enough in advance. However, that extra bandwidth is meaningless if the processor can’t handle it.

This all has cost implications. You don’t want to spend extra money on performance you can’t use. Given the choice between two options that suit your machine, you want the one that gives the best performance, which may mean a stick with a lower data rate as well as a lower CL. How much are you willing to pay to save a few nanoseconds per data request? Lots to consider there.

4. Power supply units can be a considerable heat and sound source, especially when operating near capacity. Basically, a computer needs to be able to power all of its components - hard drives, optical drives, USB sticks, WiFi cards, graphics cards - everything. The power supply unit (PSU) handles all of that safely. If the computer draws too much power, the PSU, in theory, should shut down. Or catch fire. Or fail. Something like that.

However, it turns out the PSU is one of the areas on which manufacturers (like Dell) tend to be really cheap about. Honestly, when most of us go computer shopping we never ask “So what’s the wattage rating of that PSU?” So manufacturers tend to use the cheapest thing they can, which means calculating the wattage requirement of your machine, then installing whatever power supply comes closest without being a fire hazard. There are three problems with this: first, it limits our options for upgrading the machine by adding new components down the road. Second, PSU’s operating near their rated load tend to run hotter (requiring the cooling fans to work harder, thus generating more noise), and less efficiently. Third, cheap PSU’s don’t necessarily have safety options, like built-in surge protection. Lot’s of users have been hosed by a PSU going bad and frying a motherboard and all of the parts. If you have a no-name unit, you probably have no recourse either.

That said, there really isn’t a reason to not get a very good PSU with a much higher rating than you plan on needing. For example, if you think you will need 325W, get a PSU rated to 700W or 750W from a company with a solid reputation. Your computer will only draw what it needs, and will actually run cooler, quieter and more efficiently. Plus, you’ll have plenty of power margin if you want to upgrade.

I learned a lot of other things too, but these points seemed most relevant.

Researching Bender’s replacement

In addition to Newegg and Tom’s Hardware, I also found a site called PCPartPicker, which is quite possibly one of the greatest computer sites I’ve ever seen. Basically, it lets you “build” a hypothetical computer, tells you if you’ve chosen compatible parts, and gives specs, reviews and links to the cheapest place to buy each part. It also estimates power consumption of the build so you can get the proper power supply. Awesome.

I used PCPartPicker, Newegg and Tom’s hardware to come up with a few PC mockups. I made some Intel-based computers, and some AMD-based computers. I looked at pricing, reviews, performance benchmarks, and researched the technologies behind each. I eventually narrowed it down to an AMD FX-8350 and an Intel i5-4670K. The side-by-side specs heavily favored the FX-8350:

Metric	FX-8350	i5-4670K
Frequency	4.0 GHz	3.4 GHz
Cores	8	4
Threads	8	4
L1 Cache	4x64KB shared instruction 8x16KB data cache	4 x 32 KB instruction caches 4 x 32 KB data caches
L2 Cache	4x2MB shared	4x256KB
L3 Cache	8MB shared	6MB shared
Max RAM	32 GB	32 GB
RAM speed	1866	1600
Socket Type	AM3+	1150
GPU	No	Yes
Power Req	125W	84W
Price	$200	$240

On the surface, the only thing the i5 has over the FX-8350 is an integrated GPU and a lower power requirement. However, that’s not the whole picture.

In principal, 8 cores should allow the processor to handle more tasks at once, or at least more complex tasks. It turns out that, for all it’s extra cores, the FX-8350 is only 2% better than the i5 in multithreaded applications, but the i5 is 53% faster in single-threaded applications than the FX-8350. This actually appealed to me after I discovered that most programs that I use on a regular basis are only single-threaded. Even resource-hungry games only make use of a few cores. The only things that would benefit from 8 cores would be CAD work, video editing, encoding, etc, of which I do very little.

Further, the FX-8350 is using the AM3+ socket, which (as of this writing) has an uncertain future. It is a more traditional type of CPU, as opposed to a newer APU like the i5-4670K. The difference being that the i5 has an integrated GPU whereas the FX-8350 (none of the FX-series CPUs, in fact) have anything like that. A recently-leaked roadmap seems to indicate that AMD will be retooling and pushing APUs in the future. There’s nothing certain at this point, but my guess is that when they release one in a few years, it will require a new socket type. That fact would effectively lock me into an architecture that will be deprecated in a couple of years. If I want to upgrade after that, I would need to get a new motherboard.

Contrast that with the i5-4670K, which is using Intel’s new LGA1150, which not only serves the entire current line of “Haswell” processors (including the hyperthreaded i7-4771), but it will also support the upcoming “Broadwell” chips as well. That means that in a few years, I could upgrade to a i7-4771 after the Broadwells are out, and the price is $100 or so, without needing a new motherboard. That makes the i5-4670K much more future-proof than the FX-8350.

Each processor also carries slightly different instruction sets. The i5-4670K incorporates the newish AVX2 instructions, while the FX-8350 includes the FMA4 and XOP instruction sets. Personally, I can’t really tell the difference between them, but in my research it seems like there’s some contention between the rivals there. All I got out of it was that AVX2 is probably what I would want more, but I doubt I would notice either way.

The conclusion here was that the extra $40 for the Intel chip yields significantly better performance in the apps I use most, reduces my system’s power consumption (and the accompanying heat/noise generated thereby), eliminates the need for a graphics card, and hedges against obsolescence in the near future. Sold.

From there, it was a matter of picking a motherboard to match the CPU’s specs. I considered further future-proofing by opting for a motherboard with support for a ridiculous amount of RAM or something, but I didn’t see the value there. It took me 6 years to go from needing 1GB to 6GB on Bender, and even on my heaviest-use days, I never need to use swapfiles. The likelihood that I’ll need more than 32GB of RAM in the next decade seems far-fetched.

After much deliberation, I settled on the Asus Z87-PRO. It is capable of supporting everything the i5-4670 does and more. It supports overclocking and the BIOS is flashable and gives a ton of options. In addition to coming with dual-band, onboard 802.11n WiFi, it also has built-in Blutooth 4.0, plenty of USB3 and SATA3 ports, and more.

After selecting the processor and motherboard, the rest came easy. I went with a Corsair RM750 power supply, because it’s modular (meaning I don’t have extraneous power cables everywhere), it’s high-efficiency, and designed to be silent. I also went with two 8GB sticks of Crucial Ballistix Tactical CL8 RAM. I strongly considered two 8GB G.Skill Trident X sticks, as they were in the same price range, only CL7, but for reasons I’ll cover in a moment, I went with Crucial. I decided on a case that would help minimize noise as well, so I went with a Fractal Design R4 because it’s designed to be as quiet as possible (it comes with two super-quiet fans, vibration-absorbing hardware mounts and sound-insulating material on the inside panels).

Finally, I decided to get an SSD drive to boot my various operating systems. I was trying to price this build to come in under $900, so the best drive I could find after everything else was selected was a Samsung 840 EVO 120GB. In retrospect, I should have kicked in extra for the 250GB version. Oh well.

Archer without FX

I have the unending joy of living in a city with a Micro Center, and I thought I would do a bit of window shopping before making my purchases at the cheapest places I could find online. I had a list of parts I wanted in hand, as well as the outlets that had the best prices. I struck up a conversation with one of the sales guys and he asked to see my list. He informed me that they had everything I wanted there in the store, and that they would price match everything on my list. He also said that they were having an in-store promotion on a few items, as well as a $15 rebate on the motherboard, so they were already cheaper than the ones I’d found online.

Great! The only thing they didn’t have was the G.Skill RAM. They had the Crucial Ballsix Tactical, which was a CL8 instead of CL7. By my calculations, the difference between CL7 and CL8 at 1600MHz is about 9%, but the guy offered me a discount on the Crucial sticks. So I bought under retail and didn’t pay shipping. I guess I’ll trade that for the 9%. I can always change it later if I want.

After all was said and done, the new build set me back about $875, which is roughly what I paid for Bender. Assembly was a breeze, although the R4 has this really annoying design, where the SSD is mounted on the side of the case. It can only be mounted/unmounted without the motherboard present. So after I’d secured the motherboard and began hooking things up, I realized I would have to take it all apart, mount the SSD, then reinstall everything.

The result is incredible. Archer boots into Arch Linux in 5 seconds flat. He gets into Ubuntu and Windows 7 in 6 and 8 seconds, respectively. So far I haven’t had any of the lagginess that sometimes affected Bender. The system runs much cooler and quieter too. My super-accurate, $0.99 dB meter iPhone app measured Archer to be several dB lower in volume than Bender, which is quite an accomplishment, as I’m only using the stock Intel CPU fan. I’m sure with some more tuning, I could make Archer almost silent.

Conclusion

I knew beforehand that my current partition scheme was not conducive to how I wanted to allocate files on the drives, so I figured I would simply move things around when I was ready. However, it dawned on me as I was installing operating systems and a bootloader to the SSD that there was no way to achieve the partition structure I wanted without introducing another disk into the system. That was easy enough to resolve; Micro Center had a deal on WD Caviar Blue 1TB 7200 RPM hard drives. It cost like $50. With that I was able to reallocate the files like I wanted, and I ended up with a spare drive that I can now use in Bender for whatever nefarious purposes I can come up with for him.

This was my first actual soup-to-nuts build of a computer, and I couldn’t be happier with the result. Time will tell whether Archer truly is a worthy successor to Bender, but for now things look good!

Rob Sears

Out with the old bugs, in with the new: Building 'Archer,' my new workstation

A detailed technical background with a brief shelf-life

Researching Bender’s replacement

Archer without FX

Conclusion

You might also enjoy (View all posts)