kqueue, threading, & runloops oh my

(For those impatient with my long-windedness “You talk too much. Just gimme some code!”.)

A few weeks ago it came to my attention, that the best way to handle a certain problem was to monitor a file for changes, and reload it when it did. Now in general when dealing with Cocoa most things you want to do are relatively simple once you know how and where to look for things, but every now and again figuring out where to even start involves digging deep into arcane areas, cursing the gods of Apple; and wondering what idiot thought it was a good idea to hide such important API’s under so many layers of obscurity.

This, in my experience, is the case with kqueue. Now what, you ask, is kqueue?

Well, technically, kqueue is a mostly fine grained kernel notification queue, for watching for events on files in OSX, and the BSD’s. Which, to put it simply, means that kqueue is something like a primitive  NSNotificationCenter for changes to files. At first you might think, but doesn’t NSWorkspace already have a real Notification Center for this? But actually, it doesn’t. It only posts about changes that occur by using the workspace, such as changes made in Finder. And it is only to normal files which is a limitation because OS X is built on unix principles, if not design, so everything is a file.

So if as in my case the file in question isn’t being changed via NSWorkspace or isn’t a “normal” file or is being changed in ways that the Workspace doesn’t post about, you have to dig deeper. Into kernel api territory – a dark place where programmers don’t use cross-linked web pages for documentation, but scary things called “man pages” . Okay Okay, so I actually used google to find and look at the man page so technically it was still a cross-linked webpage, but definitely far outside the scope of most of the rest of the documentation.

Now using kqueue itself is relatively simple, in that there isn’t much too it api wise  -  two functions, one struct, one macro, and a handful of constants. However because of its low level nature, it bypasses any of the high level api’s you expect to use in a Cocoa application where usually low level means that annoying bit of CoreFoundation code which Apple hasn’t gotten around to making wrappers or toll-free bridges for.

The general approach to using kqueue is simply : get a handle to kqueue,   set up a struct with some flags to watch the file that matters to you, add that struct to kqeue, poll the kqueue file for changes.

Wait, what? Poll? If it is a notification center, why would I have to poll? Aha. The gotcha, it isn’t a notification center, or even a notification queue like NSNotificationQueue. Rather it is a queue of notifications. Which means it is up to you to actually check the queue for new messages. Constantly. As long you want updates.

So bypassing semantics of use for the moment, and going to the Cocoa relevant question – if it requires polling, and thus its own loop, how on earth am I supposed to use it inside a Cocoa application, and an NSRunLoop? The usual answer is – you don’t use it in an NSRunLoop, and you setup a separate thread to do it for you. Which of course, requires either spawning a new thread for everything you want to watch, or managing thread synchronization issues if you need to change, and it makes actually getting notified about the changes if you want to do your work on your main application thread (such as notify the user), a pain.

But if that is the way it is that is the way it is, right? So my first attempt involved taking some sample code, and shoving it into an NSThread detached selector. It worked, barely, but how to get notified about changes? I played with a notification center, but if you haven’t used them much before, you quickly realize that NSNotificationCenter is not thread safe (for obvious reasons when you understand how they work). Now Apple has NSNotificationQueue, for helping this problem, but I chose the more immediate NSDistributedNotificationCenter, which basically means I sent my message to another program, which then sent it back to me, on my other thread. Ridiculous. But it worked for basic testing.

Then came the synchronization issues – if I wanted to add (or remove!) a file I was watching from the queue, I had to add all manner of @try/@finally NSLock, or @synchronization blocks. Not pretty, not clean, and for most programmers used to simple Cocoa thread cases, very confusing.

But I finally got it working, and everything was fine and dandy, and I could finally actually try and figure out how to use kqueue to do what I wanted. What a pain right? I knew there had to be a better way, and so I went digging into the Apple API’s, and googling for common words.

What I found was a relatively obscure CoreFoundation API called CFFileDescriptor. Now CFFileDescriptor is actually not much good by itself. In fact it exists for one reason, and one reason only – to wrap a file to watch for changes in a CFRunLoopSource. To clarify this -

An NSRunLoop is a system which waits for events, and triggers code when they happen. The obvious example of this is the GUI, the run loop receives window events, and notifies the View’s to redraw themselves, or your code that a button was pressed, etc.

So you are dealing with kqueue here, which is itself a queue of notifications, and you need to be notified when events are waiting for you to read, this sounds like just the sort of thing for an NSRunLoop, right? So you dig and discover that NSRunLoop’s don’t have a way to add sources of events, but that the CFRunLoop does.

Once you realize this you figure out that, you don’t need to do thread management yourself at all! You can just tell the runloop to watch a FileDescriptor for you, this will spawn the thread for you, watch the file for you, and callback your code when events happen, on your own runloop. No fuss, no muss, no threading, no synchronization.

Because kqueue returns a low level file descriptor, since everything in unix is a file, you can wrap it in a CFFileDescriptor, create a source; and off to the races! And since it is on a thread the runloop manages, you can add things to the kqueue safely at any time, because while your code is being called, the runloop isn’t polling.

So now we now how to use kqueue within a runloop, so we have pushed the problem of kernel api by wraping the dirty details in a CoreFoundation API, and we are in familiar territory again. Now we can do what every good Cocoa Programmer tries to do – wrap all this nasty low level code in a Cocoa class.

So now back to the problem at hand. Watching files for changes. The nice way, as already attempted, is to use the current run loops notification center. So what we need is a class that starts up kqueue, wraps it in a CFFileDescriptor, adds it as a source to the current runloop, and which translates the kqueue events into notification for the rest of your classes.

Now to start to clarify all this with some actual code, first, how to start up kqueue, that is trivial enough, just -


    int kq = kqueue();

Then Comes wrapping this handle with a CFFileDescriptor, which in CoreFoundation style, is obscure and unintuitive -


CFFileDescriptorContext context = {0, self, NULL, NULL, NULL };
CFFileDescriptorRef kqueueFD = CFFileDescriptorCreate(kCFAllocatorDefault, kq, false, (CFFileDescriptorCallBack)kqueueEventCallback, &context);
CFFileDescriptorEnableCallBacks(kqueueFD, kCFFileDescriptorReadCallBack);

Basically we set up our file descriptor for kqueue, provide a callback function, along with a context defining user data passed back to the callback function, and then we enable our callback.

Then adding this FileDescriptor to a runloop -

CFRunLoopSourceRef source = CFFileDescriptorCreateRunLoopSource(kCFAllocatorDefault, fd, 0);
CFRunLoopAddSource(CFRunLoopGetCurrent(), source, kCFRunLoopDefaultMode);
CFRelease(source);

Because CF doesn’t have a concept of autorelease (come on Apple where is the consistency?), we have to release it once added, but that is all for the runloop side of things.

Now our callback will be triggered once kqueue actually has something to tell us, so lets look at what that callback will look like, with the actual down and dirty details of kqueue -

static void kqueueEventCallback(CFFileDescriptorRef fdref,
                                                   CFOptionFlags callBackTypes,
                                                   FSWatcher *self)
{
  struct kevent event;
  int status;

  struct timespec timeout = {0, 0};

  int kq = CFFileDescriptorGetNativeDescriptor(fdref);
  while (true)
  {
    /* Read Next Event */
    status = kevent(kq, NULL, 0, &event, 1, &timeout);

    /* -1 is an error, 0 is no more events */
    if (status <= 0)
      break;

    /* Propogate the event if it is a file change */
    if (event.filter == EVFILT_VNODE)
      [self propogateKQueueEvent:event];
  }

  /* Re-enable kqueue callback since this is a one-shot callback */
  CFFileDescriptorEnableCallBacks(fdref, kCFFileDescriptorReadCallBack);
}

Basically, our callback gets called, we read anything queued, and pass each event individually back our to our actual class. The timeout is for blocking while waiting for new events, but since we are using a runloop, we don’t want to block, thus 0 for timeout. Since kqueue supports multiple different types of events, we want to ignore anything that isn’t a EVFILT_VNODE, which is the type of filter used for normal file events. Some other examples of using this, are for monitoring the status of another application, or watching for new information on a socket.

Once we have read and processed all our events, we re-enable the callback since it is disabled after every trigger, and we are done.

So next up, how do we actually add files to the kqueue to watch? Actually fairly trivially, if a bit obscure because it is such a low level API. Barring any other logic it looks like -

  struct kevent change;

 int handle = open([path fileSystemRepresentation], O_EVTONLY, 0 );

  /* Add file to kqueue */
  EV_SET( &change, handle, EVFILT_VNODE,
          EV_ADD | EV_ENABLE | EV_CLEAR | EV_ERROR ,
          NOTE_RENAME | NOTE_WRITE | NOTE_DELETE |
          NOTE_EXTEND,
          0, nil );

  kevent(kq, &change, 1, NULL, 0, nil );

So we open a file path, as “Event Only”, since we don’t actually want to lock the file, or change it., then we setup an event structure with the parameters we want to watch for, and add it to the queue.

To explain a little bit further – the first set of EV_* options are general types of flags where Add means we are adding this event, Enable means enable notifications for this file, Clear means remove it from the queue once we are done it with; and Error means we want to receiver errors as well.

The other set is the actual file events we wish to be notified of – Rename, Write, Delete, mean, just what they say, and Extend means the file has been extended. Other things that can be watched for include permissions and user changes.

Something else I wish I had known early on about this is Rename events don’t actually occur on the file but on the folder that contains the file, thus the above won’t actually work for listening for rename events, unless you add two sets of events, one for the file, and one for the folder containing the file.

So onward, how to remove a file from the queue? You simply close the file handle you opened.

  close(handle);

Now this of course means you have to keep track of all the handles you open, so you have to wrap your file handles in an NSNumber, and add it to mutable dictionary, keyed by path name. But, we also want to be able get the path from the fd, so that we can pass it back in a notification, so we actually need two dictionaries, keyed as follows -

  NSNumber *fd = [[NSNumber numberWithInteger:handle] retain];

  [fds setObject:fd forKey:path];
  [files setObject:path forKey:fd];

So we have a rough idea of how to add, remove, and watch for changes, but what about the actual propagation? In the callback we simply passed the event to our class, which at a bare minimum, will lookup the path, and file event type, and post a notification, which should look something like this -

- (void) propogateKQueueEvent:(struct kevent)event
{
  NSString *path = [[files objectForKey:[NSNumber numberWithInteger:event.ident]] retain];

  if (path)
  {
    NSString *eventType = @"Unknown";

    if (event.fflags & NOTE_RENAME)
      eventType = @"Rename";
    else if ((event.fflags & NOTE_WRITE) ||
               (event.fflags & NOTE_EXTEND))
      eventType = @"Modified";
    else if (event.fflags & NOTE_DELETE)
      eventType = @"Delete";

    // Post notification
    [[NSNotificationCenter defaultCenter] postNotificationName:@"FSWatcherFileChangedEvent"
                                                        object: nil
                                                      userInfo: [NSDictionary dictionaryWithObjectsAndKeys: path, @"File Path",
                                                                                                       eventType, @"File Event", nil]];
  }

  [path release];
}

Not so hard, right? Mostly normal Cocoa code here. There is a specific gotcha however, which make this particular bit of code unreliable. Namely, an evil evil idea that still persists for reasons I cannot fathom – Atomic Writes. And for that we need a bit of a hackish workaround. You see with an atomic write, you write changes to an entirely new file, delete the old, and rename the new file back to the original file name.

This causes kqueue to drop the file handle when the first handle gets deleted, because as far as it is concerned you were watching not the path, but the specific file itself. To do this then, we have to handle delete’s a little special, so that if the file was atomically rewritten, we can re-add the file back to the queue. Roughly, this looks like the following -

    else if (event.fflags & NOTE_DELETE)
    {
      eventType = @"Delete";

      /* HACK ALERT - Try and watch out for Atomic Writes */
      NSNumber *fd = [fds objectForKey:path];

      /* Close old FD */
      [files removeObjectForKey:fd];

      close((int)event.ident);

      /* Try and re-open */
      event.ident = open([path fileSystemRepresentation], O_EVTONLY, 0 );

      if (event.ident == -1)
        [fds removeObjectForKey:path];
      else
      {
        struct kevent update;

        /* Update fd <-> path dictionaries */
        fd = [NSNumber numberWithInteger:event.ident];

        [fds setObject:fd forKey:path];
        [files setObject:path forKey:fd];

        /* Re-Add file to kqueue */
        EV_SET( &update, event.ident, EVFILT_VNODE,
        		EV_ADD | EV_ENABLE | EV_CLEAR | EV_EOF ,
                        NOTE_RENAME | NOTE_WRITE | NOTE_DELETE |
                        NOTE_EXTEND,
                        0, nil );

        kevent(CFFileDescriptorGetNativeDescriptor(kqueueFD), &update, 1, NULL, 0, nil );

        eventType = @"Modified";
      }
    }

Pretty evil right? But now we have all the pieces for a simple FSWatcher that will watch for file changes, and notify us about them. It needs some more logic to handle multiple watchers of the same file, and the subsequent retain count, and of course, we need cleanup.

Cleanup is pretty straight forward so lets take a quick look at that -

-(void) dealloc
{
  /* Close file handles */
  NSEnumerator *enumerator = [files keyEnumerator];
  NSNumber* fd;

  while (fd = [enumerator nextObject])
  {
    close([fd intValue]);

    while ([fd retainCount] > 2)
      [fd release];
  }

  /* Remove maps */
  [files release];
  [fds release];

  /* Close kqueue */
  CFFileDescriptorInvalidate(kqueueFD);
  CFRelease(kqueueFD);

  close(kq);

  [super dealloc];
}

Basically we close all the handles we opened, then cleanup the CFFileDescriptor, then close kqueue.

One other gotcha I ran into that is a little bit more obvious – just like NSWorkspace we have to have an absolute file path, so no ~’s, or file://, or symlinks. To solve this a simple method to expand a path can be used, either as a string category, or a method on your class, as follows -

- (NSString*) stringByExpandingPath:(NSString*)filePath
{
  NSString *result = filePath;

  // file://
  if ([result hasPrefix:@"file:"])
    result = [[NSURL URLWithString:result] path];

  // ~
  if ([result hasPrefix:@"~"])
    result = [result stringByExpandingTildeInPath];

  return  [result stringByResolvingSymlinksInPath];
}

And hey presto.

This solution is still relatively speaking primitive, it needs as mentioned earlier, code to handle renames. And if you want to watch an entire file tree instead of individual files, then you are getting involved in a whole different API – FSEvents.

But for now this is a good base and much saner than any other example I found.

The full class, including an ugly retain workaround for watching the same file multiple times is available here.

For more information on kqueue you can read the man page.
And for more information on CFFileDescriptor you can see the Reference

Published in:  on July 15, 2009 at 11:49 am Leave a Comment

Things I Wish I Had Known Before, The API

In the course of over ten years of hacking on so many varied types of things it is hard to truly grasp, even for me, I have come to learn many important lessons about programming. Most of these things are specific to given problems types, but there are a few which have more to do with how you look at problems than anything else. So to kick start this series here are some things I learned the hard way, and had I known sooner, would have saved me many many headaches and pointless tail chasing.

On learning a new API

  • Documentation is your friend, but should never be considered the sole authority.
  • Other people can sometimes be of use, but should never be relied on to know more than you.
  • When you don’t know, and the documentation is incomplete, the first stop should be the source itself.
  • Even when the documentation is fairly complete, the source should still be a primary resource
  • Bite Size Chunks – don’t try to learn everything at once.
  • Further – Don’t wait until you know the whole system before you start using it.
  • But – don’t avoid large code bases just because they are large.
  • Google is Your Friend.
  • Wikipedia is too.
  • When dealing with an open source API – it is not the maintainers job to answer all the questions of any programmer who happens to use it. Nor should you expect them too. Their job is to maintain the source, not explain it.
  • Assume any problem is your fault.
  • or – Don’t blame the system, the developer, the maintainer, or the guy next door for every problem you face, until you know the API so well you can be sure you are right – and still keep the option that you are wrong open for consideration.
  • It will always take longer than you want/expect

To really expand these, the biggest most frustrating thing for anyone learning a new API is lack of knowledge. In a large number of the cases, no matter what problem you run into at least some of it is simply your lack of understanding of the system. If you are learning a new system you are facing all manner of seemingly insurmountable odds, and the only way you can be sure you will surmount them is by taking things one step at a time, looking first at the documentation, then examples, trying to code with what little you know, using the source/headers, documentation, and google/wikipedia as your constantly open companions.

Some people try to keep their learning nice and clean, look at one API, one example, and sequentially build examples one at a time. This can be useful, but ultimately you aren’t learning the api, so much as learning how to ape it – which means when problems come up they are that much more obscure, because you don’t really understand what you are doing in the first place or why it works when it does, much less why it doesn’t when it doesn’t. In my experience, the absolute best way to learn an API is to dive in and USE it, and have every possible resource you can find open and on hand for you to switch between them all as you go. And when you have a problem you try and solve it. In most cases you need to understand the API to use it, aping the motions isn’t enough, because you will hit bugs, and you can’t safely debug a system you don’t understand.

But ultimately – don’t expect to be spoon fed, and don’t run away from large systems. You want to know how an API really works? If you are dealing with a public API, find a big open source source code base and really look at it, try and understand how it is using the API. Will it be daunting at first? Probably. Will it be far more than your needs? Almost certainly. But any problem you might face, the big projects already have. And if you aren’t willing to look in the dictionary for the definition of a word just because there are so many other words in it you don’t need, you are never going to learn anything at all.

On Writing new Systems and APIs -

  • Documentation is your friend, and should not be considered a problem for someone else.
  • If you don’t document it now you will be the confused developer in a few years (or tomorrow) wondering what idiot wrote this code.
  • Always use comments – they help you keep track of the code, and make it easier to read.
  • But avoid excessive comments – they clutter the code, and make it harder to read.
  • Don’t obsessive over private/internal frameworks/classes/functions, etc – you can, and will, change your mind later anyway
  • Consistency is key, but it can’t come first.
  • Don’t be afraid to change a bad api,
  • If you change a public API, make sure you maintain a backwards compatibility wrapper.
  • Avoid breaking ABI if at all possible – and it usually is
  • Don’t try and rewrite every bad api in your system all at once - Iterative improvements when possible, will always be safer, and keep you saner.
  • Plan ahead.
  • Don’t try and plan for every contingency.
  • Always plan for the possibility you forgot/overlooked something
  • You will always overlook something.
  • Don’t duplicate something someone else has already done if at all possible.
  • Don’t avoid writing something just because someone else has already done something similar.
  • Naming should be obvious within the context of the system – obscurely named classes/functions/parameters/variables don’t help anyone - least of all you.

I could keep going, but expanding on these. Writing a good api, should reflect the problems of learning one, and that means even when documentation is incomplete, the API itself should be semi-self documenting, if it isn’t you yourself will be stuck with a system you probably don’t remember, and thus can’t even easily change. Similarly, comments are good – but too many is bad. For an example of this, when your comment ratio starts to approach one or more per line of code, you probably are going overboard. More, if you need that many, your code isn’t self-documenting enough.

Realistically a good api should read kind of like prose, maybe ugly prose, but still self-describing. For this reason, when possible, methods should avoid unnamed or less than self-describing – think foo(true, false, false, true, 1) – parameters. In systems like Cocoa, this is easy, because you can literally provide the name as part of the signature. you require the argument names to call the function. For other systems it isn’t always so easy, Carl Worth of cairo for example, has made a strong stance in api development for C, to always use named enum’s whenever possible, instead of bitmasks, or boolean flags – this makes for bulkier calls, but they will be far easier to understand and use from a self-documenting standpoint.

When it comes to duplicate/not duplicating code, this is a hard issue – you should try to use a common system API whenever one is available. But when and when not to use third party api’s is always a lot more challenging, because -

  1. what about future compatibility with the system
  2. what about feature completeness for your code
  3. what about future maintenance of the third party system.

In general you shouldn’t be afraid to use any third party api – but you shouldn’t be afraid to make your own even if one exists either, so long as you keep in mind that you shouldn’t try and duplicate just for the sake of doing it yourself, and remembering that if it is a linked api, it is now a hard dependency of your stack.

There are many times you only need a fraction of what an api provides, and it doesn’t make sense to depend on the whole stack. Sometimes it does. But don’t be afraid to go either direction, just be logical about it. Be sure why you pick the route you do – and don’t let laziness, or ego, be the driving factor.

In General

  • Frustration is inversely proportional to your ability to think rationally
  • Frustration usually comes from impatience when faced with a non-obvious problem
  • Impatience is the bane of all coding – be it your impatience, or someone else’s
  • Learn to walk away, both physically and mentally
  • You will spend most of your time debugging – no matter how well your know the system
  • Debugging involves logical process, as much as intuitive leaps
  • When something goes wrong, scan the code, and iteratively check everything
  • No really - everything
  • Always Test every single thing until you find the problem, or you intuit it
  • Sometimes the hardest problems to find, are ones you miss because you are so certain it can’t be there
  • Never rely on the compiler/framework to prevent your mistakes
  • A compiler can’t prevent you from being an idiot. No matter how strict the typing, or fancy the GC
  • Creativity involves both logical consistent boundaries and the willingness to to ignore them
  • Anything is possible – whether or not it is a good idea.
  • Be willing to adapt to new paradigms, and coding styles – even if your way is better. There is a lot of code in the world, and very little of it is yours. Even if you wrote all of mozilla by hand, from scratch.
  • Sometimes your way isn’t so much “better” as “obscurely clever” – and obscurely clever isn’t always better. Sometimes it is just obscure.
  • Obfuscation can be fun, but is never a good idea for serious projects
  • If it is “clever” or “elegant” code – it is probably completely indecipherable and/or unmaintainable
  • Short simple code, is always easier to read, and maintain, than large complex functions
  • Divide and Conquer. Both in learning, and in coding
  • Just because you can do it with a single RegExp, doesn’t mean you should. In fact it probably means you shouldn’t
  • Common advice but always good – don’t optimize until your system is already functional. No really. Not even then. NEVER. not then either. seriously. you will always regret it. Or whoever has to clean up after you will.
  • Sleep on it .
  • Sometimes an all night hack session results in amazingly brilliant code. But more often it results in an amazingly brilliant, obscurely clever, simply elegant pile of useless that you will spend most of the next week recovering from.
  • Some form of Version Control is your BFF. Seriously. Even… No, especially for private code you never ever share.

I have more, but I think that is enough for now, and since I don’t think I need to summarize most of these. I think I shall end this now.

Coming up next – No idea! But the next set of entries will probably be more specifically about problems I faced, than a general list like this.

Published in:  on June 30, 2009 at 2:01 am Leave a Comment

Things I Wish I Had Known Before, Introduction

I got to thinking the other day, that I learn a lot of things, about a lot of things, in my ongoing day to day efforts to .. learn things. Right. So I thought to myself, in furtherance of this thinking about things I learned while trying to learn things, that I should perhaps reboot my blog and share these things I have learned with other people, in their own efforts to learn things .. about things.

To this end, I have decided to start a series of entries about things I wish I had known before. This is currently looking to cover things relating to OSX, Linux, Cocoa, GTK+, Objective-C, Pascal, Assembly, C++, C, and so on.

I know a lot of things, and learn more constantly, so I might as well share some of that both for my own sake, and to hopefully help prevent others the same headaches I go through.

Coming up (in no certain order and hardly complete) -

  • The sane way to use kqueue in a Cocoa application.
  • Everything you (don’t) need to know about NSStatusItem
  • CDECL calling convention gotchas
  • Notable issues in going from 32bit, to 64bit
  • How to make use of the AudioToolbox, AudioQueue, etc.
  • FreePascal Generics, how to use, when to use, simple mistakes (vs C++)
  • libobjc, Apple’s 1.0, 2.0, and GCC’s, (aka gcc needs a new runtime)
Published in:  on June 29, 2009 at 4:43 am Leave a Comment