There are others who can write about this more eloquently than I can, but pulling your leaders from many walks of life across the spectrum of gender, race, nationality, and even age, is essential to your organization’s survival. Google knows it. So does Apple.

Physics labs contain, in my experience, a cult of old technology. They use outdated tools and outdated practices to write unreliable applications. It is very difficult to make the case for improvements because the appointed leaders are older men whose technology background is completely outdated. The problem is compounded when they continue to hire physicists who follow in these traditions, rather than hiring technology specialists to do the jobs they are trained for. Labs often hire someone with a PhD in physics and a research background to run their computer centers and write their software.

I would like to think that if the labs start putting more women and minorities into director roles, it will generate a more dynamic environment and we will start to see more modern ideas being put into place.

]]>Given an unsigned integer, how would you count the number of

setbits (that is, bits = 1)?

This is otherwise known as a Hamming Weight, and there are clever tricks for solving this problem using only a few lines of C. You could also use a loop with shift and compare:

int popcount(int x) { int i; int count = 0; for (i = 0; i < sizeof(int)*8; i++) { if ( (x&1) == 1 ) { count++; } x >>= 1; } return count; }

There is actually a single CPU instruction that implements this entire function:

## CNT (vector)

Population count per byte.

## Syntax

CNT

Vd.T, Vn.T

This is an instruction, in ARM64 Assembly (AArch64), that performs the population count operation in one line of code. A similar instruction exists in most CPU assembly languages. iPhones use ARM-family chips. For example, my iPhone 6 uses an Apple A8, a System-on-a-Chip (SoC) built around an ARM64 CPU.

This inspired me to write an iPhone app that utilizes the CNT instruction, which means I get to write a little bit of ARM assembly code and deploy it on the iPhone. The code is on Github. This is what the finished product looks like:

The app is written in Swift but uses C and Assembly for the bit-counting operation. Calling ARM64 functions from Swift is a lot like calling a C function from Swift, so I wrote a C `popcount()`

function to use with the simulator (compiles into x86 assembly) and an ARM64 `popcount()`

function to deploy on the iPhone itself. The function is defined in the two files **popcount.c** and **popcount.s**. They use the preprocessor flags `#ifdef __arm64__`

and `ifndef __arm64__`

to tell the compiler which code goes with which platform.

Both files utilize the same header to declare the function. You can create a header called **popcount.h** using the File menu. But this is a Swift project, so we need a *bridging header *to make the functions callable from Swift. A bridging header is created automatically if you carefully follow the prompts when creating a C file, but you can create one manually if need be.

A function written in assembly code comes with a lot of boilerplate code for setting up the stack, returning results, aligning memory, and much more. Fortunately, there is a way to get a working example that can be used as a kind of template. The compiler will show you generated assembly if you use the right options. Take this simple Swift function:

func addTwoNumbers(_ a:Int, _ b: Int) -> Int { return a + b }

Compiling this code manually will show the assembly code generated by the **swiftc **compiler. I used this command: `xcrun -sdk iphoneos swiftc -emit-assembly -target arm64-apple-ios11.0 SimpleFunction.swift`

. It produces a very large assembly file, not all of which needs to be replicated in the **popcount.s** code. The code starting at the function label and ending with the `ret`

statement is useful, because we can use it to create our own function in assembly.

.section __TEXT,__text,regular,pure_instructions .ios_version_min 11, 0 .globl _main .p2align 2 /* Deleted _main for brevity */ .private_extern __T014SimpleFunction13addTwoNumbersS2i_SitF .globl __T014SimpleFunction13addTwoNumbersS2i_SitF .p2align 2 __T014SimpleFunction13addTwoNumbersS2i_SitF: .cfi_startproc sub sp, sp, #16 Lcfi1: .cfi_def_cfa_offset 16 adds x0, x0, x1 cset w8, vs str x0, [sp, #8] str w8, [sp, #4] b.vs LBB1_2 ldr x0, [sp, #8] add sp, sp, #16 ret /* Deleting Swift support code for brevity */

The `addTwoNumbers`

function is actually contained in the assembly code labelled `__T014SimpleFunction13addTwoNumbersS2i_SitF`

. The name has been mangled to make it unique and identifiable across potentially many compiled objects. The single line `adds x0, x0, x1`

actually contains the entire body of this simple function in one command. Here is a breakdown of the command:

`adds`

is an**opcode**, a single basic instruction in assembly. Assembly opcodes typically perform a simple math operation on one of a few built-in variables. They may also move a word of data in or out of memory. This opcode adds two numbers together, respecting the sign bit.`x0`

is the destination variable for the operation, where the results are written. These variables are called*registers*in assembly, and they live on the CPU outside of memory. There are only 31 general-purpose registers on modern ARM chips, so assembly programs spend a lot of code moving data between memory and these local registers.`x0, x1`

are the source registers. The code that came before this command takes care of pulling the values off of the call stack (placed there by the calling function) and putting them into registers.

There are a lot of other ARM64 commands, and a lot of details needed to understand function calls. But I won’t need to dive this deep to complete my task.

I need to replace this `adds`

command with `cnt`

. The `cnt`

command is actually part of the ARM64 SIMD instruction set, and uses a separate set of registers from the general-purpose x0-x31 registers. I need a three-line program: one line to move the function parameter into a SIMD register, one to perform the count, and one to move the result back to a GP register. These three lines accomplish this:

dup v0.2d, x0 cnt v0.16b, v0.16b fmov x0, v0.d[1]

Breaking down the assembly syntax again:

`dup`

duplicates an element (copies it). We are copying from a general-purpose register,`x0`

, into a specialized SIMD register,`v0.2d`

.`v0.2d`

is a 128-bit SIMD register.`x0`

is a 64-bit register, so the`.2d`

modifier instructs the assembler to duplicate`x0`

and put the two 64-bit copies into`v0`

. This isn’t necessary for our purpose, but it is useful for other algorithms.`cnt`

executes the population count, replacing the bit pattern with the number counted. For example, 01101101 would become 00000101 (101=5, because there were five bits set in the data before the operation).`v0.16b`

indicates the same register as before, except this time the SIMD register is being divided into 16 lanes, each 1-byte in size. Each of the 16 8-bit lanes inside the vector is treated as a separate register and they are all executed in parallel. Vector instructions like to perform one operation on many pieces of data in parallel, hence SIMD: Same Instruction, Multiple Data elements.`fmov`

moves data between registers again, this time putting the results back in`x0`

. The*f*here actually makes this a floating-point instruction, because FP and SIMD instructions share the same registers in ARM64. But it doesn’t make this a floating-point number.`v0.d[1]`

is once again the same SIMD register, but with a different way of splitting up the data elements. The*d*indicates that a*double-word*(64 bits) is being copied out of element 1 of the SIMD register. This puts the results into our GP register.

Note: `CNT`

only works in 8-bit chunks due to hardware limitations (more bits = more wires = less room to implement other features). If I wanted to support values larger than 255, I could add an `addv`

command, which adds the 8-bit chunks together. There are many more ARM topics I have not explored.

The resulting assembly code reports the population count, and the Swift code displays the results to the user:

If you’re new to assembly language, it is worth noting some oddities of the assembly language environment. Assembly language gives you complete low-level control over the behavior of your CPU, but this means that understanding the details of CPU architecture is essential to writing efficient assembly. Instructions that rely on shared resources can create delays. Memory loads take several instruction cycles to complete.

Compilers are pretty good at allocating registers and scheduling instructions. It may be worth writing code in C and using the generated assembly as a starting point to see how it can be done. There is a lot more to writing effective assembly than just knowing the syntax of the language.

You can find lots of resources for learning ARM64 assembly. A crash course in ARM64 assembly might be a good place to start, and the ARMv8 Instruction Set Overview is both authoritative and exhaustive. Github member Richard Ross has written an entire iOS app in assembly.

You can practice ARM64 using QEMU, which, although it requires a lot of effort to set up, will let you analyze assembly instructions one-by-one as the execute.

]]>This poor substitute of mine only supports the binary OR and POPULATION operations and has no string representation. But it worked for what I needed.

//: Playground - noun: a place where people can play import Cocoa struct Bin { var value: [UInt64] = Array<UInt64>(repeating: 0, count: 10) // LSD at [0] init() { } init(_ string: String) { var index = 0 var finish = string.endIndex repeat { let start = string.index(finish, offsetBy: -64, limitedBy: string.startIndex) ?? string.startIndex value[index] = UInt64(string[start..<finish], radix: 2)! finish = start index += 1 } while finish != string.startIndex } static func | (lhs: Bin, rhs: Bin) -> Bin { var c = Bin() for (n, (a, b)) in zip(lhs.value, rhs.value).enumerated() { c.value[n] = a | b } return c } func population() -> Int { var pop = 0 for v in value { var a = v while a > 0 { pop += Int(a % UInt64(2) & 1) a /= 2 } } return pop } } let x = Bin("10000000000000000000000000000000000000000000000000000000000000000") let a = Bin("10101010101010101010101010101010101010101010101010101010101010101010101010") a.population() let b = Bin("01010101010101010101010101010101010101010101010101010101010101010101010101") b.population() let c = a | b c.population()

]]>

Cheers!

]]>I cannot add or subtract anything from this.

]]>Of course, you can just use `sort()`

or `sorted()`

, but if you are inclined to roll your own, the Swifty way to do it is to create an extension. Note that an Array extension with a Generic Where Clause is needed to make element comparisons `> < ==`

work.

//: Playground - noun: a place where people can play import Cocoa // var elements = [0, 6, 0, 6, 4, 0, 6, 0, 6, 0, 4, 3, 0, 1, 5, 1, 2, 4, 2, 4] var elements = [10, 6, 0, 1, 2, 5, 4, 3, 7, 8, 9] extension Array where Element: Comparable { func heapSorted() -> Array { var r = self func parent(_ i: Int) -> Int { return ((i+1) / 2) - 1 } func leftChild(_ i: Int) -> Int { return ((i + 1) * 2) - 1 } func rightChild(_ i: Int) -> Int { return leftChild(i) + 1 } func descend(i: Int, limit: Int) { let lindex = leftChild(i) let rindex = rightChild(i) if rindex > limit || r[lindex] > r[rindex] { if lindex <= limit && r[lindex] > r[i] { (r[lindex],r[i]) = (r[i],r[lindex]) descend(i: lindex, limit: limit) } } else if rindex <= limit { if r[rindex] > r[i] { (r[rindex],r[i]) = (r[i],r[rindex]) descend(i: rindex, limit: limit) } } } func maxHeapify() { let lastParent = parent(r.count - 1) for i in (0...lastParent).reversed() { descend(i: i, limit: r.count - 1) } } func sortFromHeap() { for i in (1..<r.count).reversed() { (r[0],r[i]) = (r[i],r[0]) descend(i: 0, limit: i-1) } } maxHeapify() sortFromHeap() return r } } elements.heapSorted().forEach{print($0, terminator: " ")} print()

On to writing mergesort (stable).

Sigh.

]]>In the process, I needed some matrix operations for a medium-difficulty problem. And here they are, code style be damned :

func transpose(_ matrix: [[Double]]) -> [[Double]] { let rowCount = matrix.count let colCount = matrix[0].count var transposed : [[Double]] = Array(repeating: Array(repeating: 0.0, count: rowCount), count: colCount) for rowPos in 0..<matrix.count { for colPos in 0..<matrix[0].count { transposed[colPos][rowPos] = matrix[rowPos][colPos] } } return transposed } func multiply(_ A: [[Double]], _ B: [[Double]]) -> [[Double]] { let rowCount = A.count let colCount = B[0].count var product : [[Double]] = Array(repeating: Array(repeating: 0.0, count: colCount), count: rowCount) for rowPos in 0..<rowCount { for colPos in 0..<colCount { for i in 0..<B.count { product[rowPos][colPos] += A[rowPos][i] * B[i][colPos] } } } return product } // gauss jordan inversion func inverse(_ matrix: [[Double]]) -> [[Double]] { // augment matrix var matrix = matrix var idrow = Array(repeating: 0.0, count: matrix.count) idrow[0] = 1.0 for row in 0..<matrix.count { matrix[row] += idrow idrow.insert(0.0, at:0) idrow.removeLast() } // partial pivot for row1 in 0..<matrix.count { for row2 in row1..<matrix.count { if abs(matrix[row1][row1]) < abs(matrix[row2][row2]) { (matrix[row1],matrix[row2]) = (matrix[row2],matrix[row1]) } } } // forward elimination for pivot in 0..<matrix.count { // multiply let arg = 1.0 / matrix[pivot][pivot] for col in pivot..<matrix[pivot].count { matrix[pivot][col] *= arg } // multiply-add for row in (pivot+1)..<matrix.count { let arg = matrix[row][pivot] / matrix[pivot][pivot] for col in pivot..<matrix[row].count { matrix[row][col] -= arg * matrix[pivot][col] } } } // backward elimination for pivot in (0..<matrix.count).reversed() { // multiply-add for row in 0..<pivot { let arg = matrix[row][pivot] / matrix[pivot][pivot] for col in pivot..<matrix[row].count { matrix[row][col] -= arg * matrix[pivot][col] } } } // remove identity for row in 0..<matrix.count { for _ in 0..<matrix.count { matrix[row].remove(at:0) } } return matrix } let X = [ [1.0, 2.0, 3.0], [4.0, 5.0, 11.0], [7.0, 8.0, 9.0] ] let XI = inverse(X) let I = multiply(X,XI) print(I)

That’s the identity matrix popping out at the end, which validates my implementation.

But what’s this? A 60-line method? Uncle Bob would **not **be pleased.

Comments should not take the place of good variable/method names. Those section comments give clues as to where my methods should be :

func augment(_ matrix: [[Double]]) -> [[Double]] { var augmented = matrix var idrow = Array(repeating: 0.0, count: matrix.count) idrow[0] = 1.0 for row in 0..<matrix.count { augmented[row] += idrow idrow.insert(0.0, at:0) idrow.removeLast() } return augmented } func deaugment(_ matrix: [[Double]]) -> [[Double]] { var deaugmented = matrix for row in 0..<matrix.count { for _ in 0..<matrix.count { deaugmented[row].remove(at:0) } } return deaugmented } func partialPivot(_ matrix: inout [[Double]]) { for row1 in 0..<matrix.count { for row2 in row1..<matrix.count { if abs(matrix[row1][row1]) < abs(matrix[row2][row2]) { (matrix[row1],matrix[row2]) = (matrix[row2],matrix[row1]) } } } } func scaleRow(_ matrix: inout [[Double]], row: Int, scale: Double) { for col in 0..<matrix[row].count { matrix[row][col] *= scale } } func addRow(_ matrix: inout [[Double]], row: Int, scaledBy: Double, toRow: Int) { for col in 0..<matrix[row].count { matrix[toRow][col] += scaledBy * matrix[row][col] } } func pivot(_ matrix: inout [[Double]], row pivotRow: Int, col pivotCol: Int, forward: Bool) { let scale = 1.0 / matrix[pivotRow][pivotCol] scaleRow(&matrix, row: pivotRow, scale: scale) if forward { for toRow in (pivotRow+1)..<matrix.count { let scaleBy = -1.0 * matrix[toRow][pivotCol] addRow(&matrix, row: pivotRow, scaledBy: scaleBy, toRow: toRow) } } else { for toRow in (0..<pivotRow).reversed() { let scaleBy = -1.0 * matrix[toRow][pivotCol] addRow(&matrix, row: pivotRow, scaledBy: scaleBy, toRow: toRow) } } } func gaussJordanInverse(_ matrix: [[Double]]) -> [[Double]] { var matrix = augment(matrix) partialPivot(&matrix) for p in 0..<matrix.count { pivot(&matrix, row: p, col: p, forward: true) } for p in (0..<matrix.count).reversed() { pivot(&matrix, row: p, col: p, forward: false) } matrix = deaugment(matrix) return matrix } let X = [ [1.0, 2.0, 3.0], [4.0, 5.0, 11.0], [7.0, 8.0, 9.0] ] let XI = gaussJordanInverse(X) let I = multiply(X,XI) print(I)

Better. Uncle Bob would be proud (or give me credit for trying, anyway).

One more thing : Thanks to StackOverflow user Alexander, I have an even better way to express that pivot loop :

func pivot(_ matrix: inout [[Double]], row pivotRow: Int, col pivotCol: Int, forward: Bool) { let scale = 1.0 / matrix[pivotRow][pivotCol] scaleRow(&matrix, row: pivotRow, scale: scale) let range = forward ? AnyCollection((pivotRow+1)..<matrix.count) : AnyCollection((0..<pivotRow).reversed()) for toRow in range { let scaleBy = -1.0 * matrix[toRow][pivotCol] addRow(&matrix, row: pivotRow, scaledBy: scaleBy, toRow: toRow) } }]]>

I hope to see you there!

]]>I am looking forward to meeting many talented developers and learning all about who you are and what you do.

]]>

If you are near the Williamsburg Library on Scotland St. this Saturday, July 22 2017, at 10AM, and you have an interest in iOS development using the Swift programming language, please stop by. I will be giving a presentation on language basics and the development environment. We have a diverse group of enthusiasts with newbies and app store veterans alike.