Thanks. That was the piece I was missing.
Confusingly, TIers in other threads have suggested that SPI mode 0 (clock idles low, sample on rising edge) is correct. Also, the chip outputs data on the falling edge, suggesting it should be sampled on the rising edge, but I guess sampling on the falling edge should still work if the microcontroller's hold time is reasonably low--it just doesn't match the standard SPI timing diagrams. Anyway, I bit-banged something that seems to work (can read and write registers), and now I'll try to get it working with the SPI peripheral.