Debugger interfaces are usually provided by the operating systems. However, in the case of CUDA applications, the operating system is helpless to provide any kind of support for the device code. Therefore, NVIDIA included a debugger interface inside CUDA libraries. The interface is not documented in CUDA programming guides, so there is little one can do, except for trying to hack into the system.
First thing to note is that cuda-gdb, NVIDIA's proprietary debugger, is able to work with device code. This is proof that such a debugger interface exists and is functional.
Second thing to note is that cuda-gdb is an extension to gdb, and shares the same kind of license. I never quite learned various licensing levels, so I can't really tell the name of that specific license; but it makes NVIDIA release the source code for cuda-gdb. Since it uses the very same debugger interface we want to use, reading the cuda-gdb code is a good way to hack into the debugger interface. cuda-gdb source is available at ftp://download.nvidia.com/CUDAOpen64/.
The single most important file in the source tree is cudadebugger.h. The debugger interface is defined in this file. Although there is no documentation describing how the interface can be used, one can still guess what each function in the interface is supposed to do.
I will briefly describe what each of the functions I am familiar with does:
cudbgGetAPI: This function has to be called to fill a CUDBGAPI structure with a list of function pointers for further use.
cudbgGetAPIVersion: This function can optionally be called to check the cuda library version.
The following functions will only be available after you call cudbgGetAPI function:
initialize: initializes debugger interface. It must be called before calling other functions.
suspendDevice / resumeDevice: These functions must be called to suspend or resume the execution on the device with the given device number. It will return an error code if the device is not executing any code at the moment.
setNotifyNewEventCallback: Sets a callback function for upcoming events. The callback function must be a void function and accept a void* parameter. Users decide what would be passed to this function through this parameter.
getNextEvent / acknowledgeEvent: These functions are used inside the callback function to get info about the events and clear the event queue. You can use a for loop to get all events from the queue like the one below:
for (res = cudbgAPI->getNextEvent(&event);res == CUDBG_SUCCESS && event.kind != CUDBG_EVENT_INVALID;res = cudbgAPI->getNextEvent (&event)) {.....}
In the end, you should call acknowledgeEvent(NULL). I think this call finalizes the processing of the events obtained through getNextEvent.
singleStepWarp: The next instruction is executed on a given device, with a given streaming multiprocessor number, and a given warp number.
At this point, you should be able to play with other functions in the debugger interface, such as getPC, or readCodeMemory. I might post further details later on if I continue working on this stuff.