Writing an Audio Unit v3: Instrument
Audio Unit v3 Instrument
I’ve talked about the newish AUv3 MIDI capability in previous blog posts.
But, the majority of Audio Units deal with, well, audio.
I know, duh!
So how do you do that?
Set up
The Scary Bit
Allocating the Bus and Buffer
Responding to MIDI events
DSPKernel
What’s next?
Summary
Resources
Introduction
Audio Units have been around a loooong time. But version 3 is fairly recent (2015). The API in v3 is quite a bit different from v2, but much of v2 is still lurking in the closet. Also, if you have a v2 audio unit, Apple supplies a way to “bridge” to v3.
There is not much documentation on it though. It’s the Apple Way! Let the developers play detective instead of writing apps!
The documentation that we do have is the (in)famous FilterDemo which combines an Audio Unit Host, a Filter, and an Instrument for both iOS and macOS, and apps that use the audio units besides the host.
And a partridge in a pear tree.
It’s intimidating. I’ve know devs who looked at it and essentially said “some other time…”.
So, maybe the way to go is to do little pieces at a time. Grok those, then add some more.
Let’s create just an audio unit Instrument that plays the world famous Sine Wave.
Okeydokey?
Set up
Create a “regular” iOS app. Give it a clever name like SimpleSynthApp.
Then add an audio unit extension. I showed this in my post on MIDI audio units. Only this time, select Instrument as the audio unit type. Remember that the Manufacturer needs to be exactly 4 ASCII characters.
If you named your app SimpleSynthApp, name the audio unit SimpleSynth. Anyway, that’s my convention. Use whatever makes sense to you of course.
The scary bit
So, what do we have to do?
We have to fill a buffer with samples!
What buffer?
Here’s the end of the story. You have to implement the internalRenderBlock method which returns an AUInternalRenderBlock. (An Objective-C block is like a Swift closure).
(Take a look at your generated audio unit. This will be stubbed out for you)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
- (AUInternalRenderBlock)internalRenderBlock { // Capture in locals to avoid ObjC member lookups. // If "self" is captured in render, we're doing it wrong. See sample code. return ^AUAudioUnitStatus(AudioUnitRenderActionFlags *actionFlags, const AudioTimeStamp *timestamp, AVAudioFrameCount frameCount, NSInteger outputBusNumber, AudioBufferList *outputData, const AURenderEvent *realtimeEventListHead, AURenderPullInputBlock pullInputBlock) { // Lasciate ogne speranza, voi ch'entrate. |
In this returned AUInternalRenderBlock, we are handed an AudioBufferList pointer, cunningly named outputData for, well, output!
This AudioBufferList struct holds an array of AudioBuffers in a field named mBuffers.
It looks like this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
struct AudioBufferList { UInt32 mNumberBuffers; AudioBuffer mBuffers[1]; // this is a variable length array of mNumberBuffers elements #if defined(__cplusplus) && defined(CA_STRICT) && CA_STRICT public: AudioBufferList() {} private: // Copying and assigning a variable length struct is problematic; generate a compile error. AudioBufferList(const AudioBufferList&); AudioBufferList& operator=(const AudioBufferList&); #endif }; typedef struct AudioBufferList AudioBufferList; |
And the AudioBuffer struct:
1 2 3 4 5 6 7 |
struct AudioBuffer { UInt32 mNumberChannels; UInt32 mDataByteSize; void* __nullable mData; }; typedef struct AudioBuffer AudioBuffer; |
As you can see, there is a void pointer in this struct that we point to data we’ve allocated. The field mDataByteSize is how many bytes are in mData.
So, in your render block, you will use it somewhat like this:
1 2 3 4 |
AudioBufferList *outAudioBufferList = outputData; for (UInt32 i = 0; i < outAudioBufferList->mNumberBuffers; ++i) { outAudioBufferList->mBuffers[i].mData = something } |
Why is this the scary bit? (yeah, I know, bytes…)
The render block is called a zillion times very fast. If the work you do inside that block doesn’t execute right away, you will hear a glitch. You do not want to call anything that blocks. Or allocates memory. Or Locks. Or does garbage collection. These are a few reasons you cannot write the render block in Swift. Yet. And you can’t write it in Objective-C either! Yeah, I hear you – but the template spits out this .m file with the skeleton. Well, ok. If you can do the entire render inside the block without calling any Objective-C method, you can get away with it. I’ve tried it. It’s a mess even with a simple render method.
Read it if you’re going to be doing audio programming.
Really.
So, where does that leave us?
You can do it in C. You can also remove the handle from your soldering iron and hold the hot bit between your teeth.
Or you can use C++ which doesn’t have the overhead of Objective-C or Swift. Hey, learning C++ is good for you.
<snark>What other language solves problems simply by adding a new language feature?!</snark>
In fact, if you look at the FilterDemo, they use C++. Well, Objective-C++.
Honestly, the shortest path between your empty template and something that bleeps and blorps, is reusing some of that C++. I’ll do that below.
So, that’s the end of the story. There are few things we need to do before we can put bytes into that outputData field.
Allocating the Bus and Buffer
You just saw that the buffer has a void pointer to your data. You probably need to allocate a buffer for your data and then point to it. What’s this weaselly “probably” jazz? Well, it depends on the Audio Unit Host. AUM for example, will give you an allocated buffer. I checked by setting a breakpoint in the render block and inspecting outputData. The AU host status at this moment is a bit of the wild west, and you shouldn’t rely on it being allocated.
For what it’s worth, AUM is an excellent host. Just my opinion.
When do we allocate this buffer? And what format is it?
Part of your “Audio Unit Contract” is to return an AUAudioUnitBusArray pointer. The template code that is generated when you create the app extension warns you like this.
1 2 3 4 5 6 7 |
// An audio unit's audio output connection points. // Subclassers must override this property getter and should return the same object every time. // See sample code. - (AUAudioUnitBusArray *)outputBusses { #warning implementation must return non-nil AUAudioUnitBusArray return nil; } |
So, you need properties for an output bus and the array.
1 2 3 4 5 6 7 |
@interface SimpleSynthAudioUnit () @property (nonatomic, readwrite) AUParameterTree *parameterTree; @property AUAudioUnitBus *outputBus; @property AUAudioUnitBusArray *outputBusArray; @end |
And then in your initWithComponentDescription method, create them.
The bus requires an AVAudioFormat, so here I create the default stereo format at the CD sampling rate. The default format is non-interleaved samples.
The bus array can be for either input (e.g. filters) or output. So, I specify the output type and pass in the bus just created.
Of course you can have multiple busses (the render block passes in a bus number), but I’m keeping it simple.
1 2 3 4 5 6 7 8 9 |
AVAudioFormat *defaultFormat = [[AVAudioFormat alloc] initStandardFormatWithSampleRate:44100.0 channels:2]; _outputBus = [[AUAudioUnitBus alloc] initWithFormat:defaultFormat error:nil]; // Create the input and output bus arrays. _outputBusArray = [[AUAudioUnitBusArray alloc] initWithAudioUnit:self busType:AUAudioUnitBusTypeOutput busses: @[_outputBus]]; |
What about the actual audio buffer though?
Create an AVAudioPCMBuffer with the bus format you just created
in the Audio Unit’s allocateRenderResourcesAndReturnError method.
You will also need a frame count. The Audio Unit has its maximumFramesToRender property set by the host, so use that.
Out of an abundance of caution though, you should give it a default value in your init method.
Also, you’ll need an ivar for the AVAudioPCMBuffer. For convenience later, you can also grab their audioBufferLists.
1 2 3 |
pcmBuffer = [[AVAudioPCMBuffer alloc] initWithPCMFormat:bus.format frameCapacity: maxFrames]; originalAudioBufferList = pcmBuffer.audioBufferList; mutableAudioBufferList = pcmBuffer.mutableAudioBufferList; |
Ok, we’re getting close! Promise!
Remember that previous example of using the outputData AudioBufferList? Now let’s check the first buffer to see if it’s null. If it is, then the host didn’t allocate anything, so now we point to our AVAudioPCMBuffer. (note: nullptr is a C++ thing. We’re in C++ land now).
1 2 3 4 5 6 7 |
AudioBufferList *outAudioBufferList = outputData; if (outAudioBufferList->mBuffers[0].mData == nullptr) { for (UInt32 i = 0; i < outAudioBufferList->mNumberBuffers; ++i) { outAudioBufferList->mBuffers[i].mData = pcmBuffer.mutableAudioBufferList->mBuffers[i].mData // or use the mutableAudioBufferList ivar you created } } |
Well, almost. pcmBuffer is an instance variable. The render block is “closing over” it and hanging onto self. Memory problems. It’s also sending an objc message. No-no. They even have a comment in the template code saying don’t do that.
So, don’t do that.
Create a block variable that points to your pcmBuffer and then use that in the block.
Like this.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
- (AUInternalRenderBlock)internalRenderBlock { // Capture in locals to avoid Obj-C member lookups. If "self" is captured in render, we're doing it wrong. See sample code. __block AVAudioPCMBuffer* pcm = pcmBuffer; // Now use variable pcm instead of pcmBuffer in the block below. or above :) return ^AUAudioUnitStatus(AudioUnitRenderActionFlags *actionFlags, const AudioTimeStamp *timestamp, AVAudioFrameCount frameCount, NSInteger outputBusNumber, AudioBufferList *outputData, const AURenderEvent *realtimeEventListHead, AURenderPullInputBlock pullInputBlock) { AudioBufferList *outAudioBufferList = outputData; if (outAudioBufferList->mBuffers[0].mData == nullptr) { for (UInt32 i = 0; i < outAudioBufferList->mNumberBuffers; ++i) { outAudioBufferList->mBuffers[i].mData = pcm.mutableAudioBufferList->mBuffers[i].mData } |
Our bus and buffer are now ready.
Now what?
Two more things.
Putting samples into that buffer.
This is your special sauce. How to put the samples in the buffer will be the same pretty much in each audio unit, but the actual samples are the product of your genius. Or hallucinations.
Responding to MIDI events like note on/note off to actually play the samples at the pitch designated in the events.
This will be pretty much the same in each audio unit. So, since FilterDemo has a working example, let’s see what they do.
Responding to MIDI events
Let’s handle MIDI events as the FilterDemo does.
There are a few files to copy from FilterDemo into your extension.
In the group “Shared Framework Code” copy these files to your project.
DSPKernel.hpp
DSPKernel.mm
and while we’re at it,
BufferedAudioBus.hpp
There is no BufferedAudioBus.mm
The fact that they are in a group named “Shared Framework Code” (and are targeted to each framework) tells you that this is “common code” to be reused.
The .mm extension says that it contains C++ code. While you’re at it, change the extension of your own audio unit from .m to .mm. If you don’t, you will get weird linker problems. (What linker problems aren’t weird?)
The DSPKernel is an abstract base class with virtual functions. They factored out the common code and the special sauce DSP code for your very own super-special audio unit will be in a subclass that implements those virtual functions.
But first, the BufferedAudioBus.
At the top of your audio unit, import it.
1 |
#import "BufferedAudioBus.hpp" |
Yeah, I just talked about how to set up the output buffer and the bus array. Now you know how to do it and what’s needed. The BufferedAudioBus does the same thing, but since you’re going to be doing this in every audio unit, it’s encapsulated here. Go ahead and take a look at it. Some of our old init code is there.
Let’s look at how it’s used in the InstrumentDemo audio unit implementation.
1 2 3 4 5 |
@implementation AUv3InstrumentDemo { // C++ members need to be ivars; they would be copied on access if they were properties. InstrumentDSPKernel _kernel; BufferedOutputBus _outputBusBuffer; } |
You see instance variables for the DSPKernel subclass and a BufferedOutputBus. Do this in your own audio unit. Yeah, you’ll need a DSPKernel subclass. That’s next. Make up a name, put in the variable and comment it out for now.
Or just go ahead and create SimpleSynthKernel.hpp now. The demo just creates it in a single hpp file. You could create a separate implementation (.mm) file if you like extra work.
Change the allocate method to tell the new BufferedOutputBus to allocate its resources. And of course, deallocate in the deallocate method.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
- (BOOL)allocateRenderResourcesAndReturnError:(NSError **)outError { if (![super allocateRenderResourcesAndReturnError:outError]) { return NO; } _outputBusBuffer.allocateRenderResources(self.maximumFramesToRender); etc... - (void)deallocateRenderResources { _outputBusBuffer.deallocateRenderResources(); [super deallocateRenderResources]; } |
Now in the init method, call the buffer’s init function with the audio format and number of channels. Then make the _outputBus variable point to the buffer’s outputBus.
1 2 3 4 5 6 7 |
_outputBusBuffer.init(defaultFormat, 2); _outputBus = _outputBusBuffer.bus; // Create the input and output bus arrays. _outputBusArray = [[AUAudioUnitBusArray alloc] initWithAudioUnit:self busType:AUAudioUnitBusTypeOutput busses: @[_outputBus]]; |
The last thing to do with the buffer is to do the same set up we did with the raw buffer inside the render block. Go ahead and look at prepareOutputBufferList. See the similarity?
1 2 3 4 5 6 7 8 9 10 11 |
... return ^AUAudioUnitStatus( AudioUnitRenderActionFlags *actionFlags, const AudioTimeStamp *timestamp, AVAudioFrameCount frameCount, NSInteger outputBusNumber, AudioBufferList *outputData, const AURenderEvent *realtimeEventListHead, AURenderPullInputBlock pullInputBlock) { _outputBusBuffer.prepareOutputBufferList(outputData, frameCount, true); |
DSPKernel
Let’s take a look at the DSPKernel now.
The kernel is init-ed in the audio unit init… method, but also in the allocate method.
1 2 3 4 5 6 7 8 9 10 11 12 |
- (BOOL)allocateRenderResourcesAndReturnError:(NSError **)outError { if (![super allocateRenderResourcesAndReturnError:outError]) { return NO; } _outputBusBuffer.allocateRenderResources(self.maximumFramesToRender); _kernel.init(self.outputBus.format.channelCount, self.outputBus.format.sampleRate); _kernel.reset(); return YES; } |
The kernel is then used in the render block like this. Of course, use your own kernel here.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
- (AUInternalRenderBlock)internalRenderBlock { /* Capture in locals to avoid ObjC member lookups. If "self" is captured in render, we're doing it wrong. */ __block InstrumentDSPKernel *state = &_kernel; return ^AUAudioUnitStatus( AudioUnitRenderActionFlags *actionFlags, const AudioTimeStamp *timestamp, AVAudioFrameCount frameCount, NSInteger outputBusNumber, AudioBufferList *outputData, const AURenderEvent *realtimeEventListHead, AURenderPullInputBlock pullInputBlock) { _outputBusBuffer.prepareOutputBufferList(outputData, frameCount, true); state->setBuffers(outputData); state->processWithEvents(timestamp, frameCount, realtimeEventListHead); |
FilterDemo creates a block variable that points to the kernel. Then in the render block, it is passed the output AudioBufferList which the kernel subclass stores (as a variable named outBufferListPtr) and then told to process events. So add that variable and function to your kernel.
DSPKernel declares this pure virtual function.
1 |
virtual void process(AUAudioFrameCount frameCount, AUAudioFrameCount bufferOffset) = 0; |
Your kernel subclass is probably complaining that it doesn’t have this. Go ahead and add it.
How does the kernel handle events?
Here is the DSPKernel function you call from your render block.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
void DSPKernel::processWithEvents(AudioTimeStamp const *timestamp, AUAudioFrameCount frameCount, AURenderEvent const *events) { AUEventSampleTime now = AUEventSampleTime(timestamp->mSampleTime); AUAudioFrameCount framesRemaining = frameCount; AURenderEvent const *event = events; while (framesRemaining > 0) { // If there are no more events, we can process the entire remaining segment and exit. if (event == nullptr) { AUAudioFrameCount const bufferOffset = frameCount - framesRemaining; process(framesRemaining, bufferOffset); return; } // **** start late events late. auto timeZero = AUEventSampleTime(0); auto headEventTime = event->head.eventSampleTime; AUAudioFrameCount const framesThisSegment = AUAudioFrameCount(std::max(timeZero, headEventTime - now)); // Compute everything before the next event. if (framesThisSegment > 0) { AUAudioFrameCount const bufferOffset = frameCount - framesRemaining; process(framesThisSegment, bufferOffset); // Advance frames. framesRemaining -= framesThisSegment; // Advance time. now += AUEventSampleTime(framesThisSegment); } performAllSimultaneousEvents(now, event); } } |
Wow. Lots of time stuff. Go through it if you’d like. The important thing is that your process implementation is called at the appropriate times.
This is what the IntstrumentDemo’s process function looks like
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
void process(AUAudioFrameCount frameCount, AUAudioFrameCount bufferOffset) override { float* outL = (float*)outBufferListPtr->mBuffers[0].mData + bufferOffset; float* outR = (float*)outBufferListPtr->mBuffers[1].mData + bufferOffset; NoteState* noteState = playingNotes; while (noteState) { noteState->run(frameCount, outL, outR); noteState = noteState->next; } for (AUAudioFrameCount i = 0; i < frameCount; ++i) { outL[i] *= .1f; outR[i] *= .1f; } } |
It grabs the left and right channels individually as float pointers which point to the void pointers in the outputData‘s AudioBuffers. The base class keeps track of the frame count (initialized though the render block’s frameCount) and the offset into the audio buffers. Note that it’s picky code, so there’s another reason to just reuse this.
There are many ways to keep track of the MIDI events. InstrumentDSPKernel defines an internal struct named NoteState. It is a doubly linked list. This struct also keeps track of the note’s stages, i.e. attack, sustain, decay and all that guff. Then in the run function, those two float pointers passed in receive their samples.
Finally!
Let’s see a bit. (or bytes)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
void run(int n, float* outL, float* outR) { int framesRemaining = n; while (framesRemaining) { switch (stage) { case stageOff : NSLog(@"stageOff on playingNotes list!"); return; case stageAttack : { int framesThisTime = std::min(framesRemaining, envRampSamples); for (int i = 0; i < framesThisTime; ++i) { // cubing the sine adds 3rd harmonic. double x = envLevel * pow3(sin(oscPhase)); *outL++ += ampL * x; *outR++ += ampR * x; envLevel += envSlope; oscPhase += oscFreq; if (oscPhase >= kTwoPi) oscPhase -= kTwoPi; } framesRemaining -= framesThisTime; envRampSamples -= framesThisTime; if (envRampSamples == 0) { stage = stageSustain; } break; } |
Yeah, they have an NSLog call in code called from the render block.
Kids! Don’t do this at home!
No logging in the render block, nor file IO. You read Bencina’s post, right?
(To be fair, this is in a “stage” that doesn’t produce samples)
But, there!
There is your code for generating samples lurking in the NoteState’s run function.
Your mission, should you choose to accept it, is to change the DSP code inside the
for (int i = 0; i < framesThisTime; ++i) { loop to your special sauce!
(i.e. copy everything in the demo instrument kernel, then change this. And you’ll probably have to add/remove parameters etc.)
Personally, I use Matlab to prototype generating waveforms and filters. It has exceptional linear algebra capabilities. (Hey, the name is short for Matrix Laboratory after all!)
What’s next
I didn’t mention parameters here. That was another blog post on MIDI audio units. (see Resources). Luckily, they work exactly the same for audio units, so you won’t have to learn yet another tap dance!
I didn’t mention processing audio input. This post is long enough, so that’s coming in a separate post.
I didn’t mention the UI.
Another forthcoming post! Hey, I gotta keep busy.
The way that the demo code handles MIDI events works. But it does not handle MPE. “I’ll leave that as an exercise for the reader.” 🙂
ps Want a post on how to do that?
Summary
Jeez, that’s a lot of stuff to set up, isn’t it? Well, you gotta dance to the tune that’s being played.
So, yeah, we “cheated” a bit by reusing the FilterDemo kernel. I think something like that should be part of the core classes. But it isn’t, so there.
Hi there, thanks for the article, it’s really helping folk like me figure out what’s going on in AU plugins. I was hoping to take this example and add a Sidechain input, but am hitting a brick wall. I tried following the tips on https://developer.apple.com/forums/thread/50979 and https://developer.apple.com/forums/thread/76746 but no one on those threads could get this to work. If you have any insight into this, it sounds like a bunch of people would benefit.
Thanks!
— Paul