Misnomers in Computer Science

One thing you notice if you work in our field for very long is that the terms change as quickly as the practices and the technology. Once upon a time, memory was a premium product. Now it is cheap. Small-memory-footprint algorithms were once valuable. Now not so much. Open-source software was once for academics and scientists. Now the all top companies use it.

Terminology is changing as fast as the technology, but sometimes not fast enough. Take RISC and CISC : According to a questionable definition of RISC, Reduced Instruction Set Computing is called so because of the reduced number of computer instructions supported by the CPU. A better definition, in my opinion, comes from pointing out that the removal of complex memory-fetching instructions reduces the complexity of CPU design. After all, RISC CPUs add instructions at every generation, and we still called them RISC.

CISC (where C stands for Complex) computing predates RISC, and includes high-level instructions that compute complex software functions to fulfill common developer needs, such as moving sprites around on the screen or computing a high-level math function. But, starting with the 486, CISC CPUs have deprecated the clumsier older instructions, implementing them on top of a RISC core using microcode to support the older instruction set. That’s right! CISC has been RISC all along, or at least since 1989.

These days the term CISC means Intel x86 or x64, while RISC just means anything else. The terms no longer have technical value to chip developers. The same thing happened with the terms System V and Berkeley Unix : The terms just don’t mean anything outside of historical context. To test whether I’m right, try to name one difference between the two supposed flavors of operating systems. The unix family of operating systems has diverged so much that no single standard for all file placement, application behavior, and kernel functionality describes any real operating system in any way.

Another example is the seemingly innocuous term driver. Historically, the term driver was chosen for software that interfaced with a peripheral. The software drove the peripheral to come to life and start doing something. This is the same sense of driver that is used in the term test driver, which drives software to wake up and do something, so that the correct behavior of that software can be validated.

But today, a device driver is integrated into the operating system, providing applications with a standard API to request some task be performed in the hardware. The device driver has taken on a passive role. It no longer drives anything.

While reading about DDS and distributed simulation, I am struck by how easy it would be to utilize this kind of framework for a distributed control system. It turns out that DDS is being used for exactly that purpose. And not just DDS: Other software platforms, like HLA, are bridging the gap from simulation to realization. We now live in a world where you can mix and match your real and simulated components as you please.

So is the line between simulation and control system going away? Time will tell.

Heap’s Algorithm

Any sophomore computer science student knows that there are O(n!) permutations of any given array. But do you know how to produce those permutations? I didn’t , so I Googled it.

One simple strategy is to pick an element, then pick from those remaining, and so on until a single permutation is achieved. Then just backtrack one step and change your selection, finish the permutation again, and so on. For example, if you want to permute ABCD :

  1. Select A
  2. Select B, yielding AB
  3. Select C, then D, yielding ABCD, your first permutation
  4. Back up two positions and select D, then C, yielding ABDC
  5. Back up three positions and select C, then B, then D, yeilding ACBD
  6. And so on…

This might be a pretty straightforward implementation using recursion and a running collection of element lists (several copies). This is memory and CPU intensive, so an in-place solution would be nice.

Heap’s algorithm provides an in-place swapping method to find each permutation in about one step each, so it is very efficient. It is named after B. R. Heap, not after your favorite semi-sorted tree.

The core idea is very intuitive: Hold the last element of your list while recursively permuting the other n-1 elements (one must understand recursion). Then swap that last element with one of the others, and permute n-1 elements again. This way you get every possible permutation of n-1 elements for every possible fixed element value at position n. This should get us all permutations.

The tricky part is swapping out the nth element for one of the others. You might be tempted to write a loop that picks a new element from the n-1 to swap with, but remember that we permute the n-1 elements at every step, effectively scrambling them. Take AB C as an example. The n-1 permute leaves us at BA C. Say we swap with B, giving us CA B. The n-1 permute then takes us to AC B. We know that we should swap with A, because that is the only choice left to sit at the fixed position. And it is at position 0, just like the previous swaps!

The six permutations produced look like this :

ABC -> BAC -> CAB -> ACB -> BCA -> CBA

Let’s apply the same swap-with-zero strategy to four elements, abbreviating the three-element permutations for clarity :

ABCD -..-> CBAD -> DBAC -..-> ABDC -> CBDA -..> DBCA -> ABCD --X..OOOPS

We started repeating permutations before finding all of them. It turns out that AB is a different kind of permutation than ABC because of the length.

To see why, start by assuming that our permutation method will, if run twice, return the same permutation it started with. (This can be proven through induction once we understand the whole algorithm, but I will leave out the proof.) When ABC is being permuted, the first two elements are permuted three times while the elements take turns at the right position. This odd number of permutations makes the first two elements unstable in the sense that, every time we swap in the nth element, they are in a different order.

But when ABCD is being permuted, the ABC scramble is being performed an even number of times (four times), so it is stable except for the swap of the nth element. It is easier to see if we look at the algorithm output right before it swaps the nth element:

BA -> AC -> CB

CBA -> CBD -> CAD -> BAD

BADC -> AECB -> EDBA -> DCAE -> CBED

CBEDA -> CBEDF -> CAEDF -> BAEDF -> BAECF -> BADCF

Notice that the odd-length scrambles are stable except for the one-character substitution, while the even-length ones are not. This is a hand-waving explanation at best, but it works to give us an intuitive understanding of the algorithm.

So, for even-length permutations, the n-1 swap is always done with the first element, while for odd-length permutations, the n-1 swap is done with the kth element, where k is a counter from position 0 to n-2 that increments every time we do this.

The recursion is still there, but with very little extra state, very few loops and conditionals. And it takes barely more than O(n!) swaps, which is close to ideal. You can find the code in all its glory on Github, but here it is for your edification. Uncomment the console.log’s to see each operation in action.

 

JavaScript Puzzles

I found my new grind. Goodbye Hearthstone. Goodbye World of Warcraft.

Hello, CodeFights.

Arcade Mode was a lot of fun for me today. In The Labyrinth of Nested Loops, I found this little gem :

We define the weakness of number x as the number of positive integers smaller than x that have more divisors than x.

It follows that the weaker the number, the greater overall weakness it has. For the given integer n, you need to answer two questions:

  • what is the weakness of the weakest numbers in the range [1, n]?
  • how many numbers in the range [1, n] have this weakness?

Return the answer as an array of two elements, where the first element is the answer to the first question, and the second element is the answer to the second question.

What fun!

This feels like I could brute-force it with a nested loop is and suffer the O(n^2) performance. If I manage to solve it using only sequential loops on the other hand, it would be O(n) no matter how many loops I used. I think I scored somewhere in between, maybe O(n \log(n)) depending on what you believe about the inner loops in the following sections.

The problem can be decomposed by first finding the number of divisors, then counting weakness counts (not a typo – counts of counts!), and finally finding the weakest among those.

Finding divisors can be done in many ways. I use variable-stride loops to increment every spot where a given number is a divisor.

Next, I want a count of all numbers that have a given number of divisors. This segment of code will end with a variable called divisorsCounts, where divisorsCounts[3] = 4 if there are exactly 4 numbers in range that have 3 divisors.

The array divisorsCounts does not need to be that large, but it’s only memory, right? Apparently something called the Divisor Function \sigma_0(i) has an upper limit of n/2. I’m no number theorist, but I buy it. A factor of two is not even worth typing Math.floor() for, so I’ll leave the code alone as it now looks clean.

It is key here that I fill the divisorsCounts at the same time as I calculate weaknessAt any given point in time I have only counted the divisors up to the number I am examiningFor a given number m, weakness is the number of divisor counts greater than divisors[m]found in numbers smaller than m.

I can optimize these loops by shrinking divisorsCount. JavaScript creates sparse arrays by default, and their length is determined by the highest-indexed element. The inner nested loop will be shortened.

The pipes handle undefined elements (note that x + undefined will never produce a number). With n=500, divisorsCount is only 25, so this may improve performance quite a bit.

Finally, I must track the greatestWeakness and how many times it is found.

Here it is in all of its beauty.

CodeFights is awesome! Come find me! Come fight me!