Tag Archives: Automation

Maintaining backward-forward compatibility of your own Client-Server protocol

When you are integrating your client with well-known interfaces, protocols are quite clear (well, at least they should be). Whether your server is an HTTP server, COM interface or even just a Windows DLL with exported functions, you know how to communicate with the server just by looking at the documentation (e.g. HTTP protocol, MSDN). We know that if documentation declares a specific interface/protocol, it must stand behind it. Although it sounds obvious, when we develop our own custom Client-Server protocol, we sometimes miss this point. HTTP client-server is something too obvious. But there are many more we are creating. Let’s not think about them as client-server, but more as consumer-producer components. Take these examples –

  • Dll_A (client) calls exported functions in Dll_B (server)
  • Process_A (client) collects information from Process_B (server) over IPC
  • A user-space module (client) sends information to a kernel module (server)
  • Client sends a packet to server

-          All must work with a predefined interface. In order to make your client to work with the server properly, you must define a protocol that is known to both of them. The simple and best way to do it in order to minimize the chances of bugs is that both will use the exact same protocol definition files. Whether this is a header (.h) file, an xml file, an ini file – you name it, having both the client and server (or consumer and producer) use the same file means they both “speak” the same language. Consider the following protocol definition –

enum {
    eConfigEnable,
    eConfigName
}EConfigParams;

struct {
    EConfigParams eConfigParam,
    void* pData
}SConfigMessage;

The EConfigParams enum declares an interface between two components and SConfigMessage will be a generic message structure sent from client to server which can be one of two declared messages. Since SConfigMessage contains the EConfigParams enum and a void pointer it is very generic and can be expanded for many other usages (new protocol messages). Client code will look like this:

//Send enable message
SPConfigMessage configMessage = {eConfigEnable, NULL};
send(&configMessage);
//Send name message
SPConfigMessage configMessage = {eConfigName, “MyName”};
send(&configMessage);

Server code will look like this:

void receive(SConfigMessage* configMessage)
{
    if (configMessage == NULL)
        return;

    switch (configMessage->eConfigParam)
    {
        case eConfigEnable: enable(); break;
        case eConfigName:   setName((char*)configMessage->pData)); break;
    }
}

This is pretty straight forward. Client sends a message, and as a result server calls enable() or setName() according to the enum value declared in the protocol. Now we need to add a new feature that should make the server to call disable() function. Usually we add an enum value as the last value (good practice) but sometimes we add it in the place where it is more in context. In our example, having an enum value of eConfigDisable makes sense to be next to (before/after) eConfigEnable (this is risky!). Consider we are taking this approach, the enum will look like this:

enum {
    eConfigEnable,
    eConfigDisable, //This is new!
    eConfigName
}EConfigParams;

//Send disable message
SPConfigMessage configMessage = {eConfigDisable, NULL};
send(&configMessage);

The server will have a new case for eConfigDisable

case eConfigDisable: disable(); break;

In order to test your code you rebuild both your client and your server and perform a test. You reproduce eConfigDisable message scenario and noticed that client sends the right message and server calls the disable() function as expected. You even make sure no degradation exists after these changes by checking all other possible messages in the protocol – enable() and setName(). Both work as expected. Well, it seems like everything is working just fine but it may not. If there is even the slightest chance that your client code can be released to your customers without releasing the server as well, you are in a high risk of unexpected behavior at server side. If your client and your server are always being released together, not separated, not patched – there is no risk. So, where is the risk? After changing the enum to include eConfigDisable, the enum values after this value have changed. So eConfigName value was changed from 1 to 2. When you rebuild both your server and client you’re all good, but what happens when you update just your client? Or just your server? Backward-forward compatibility here will be harmed. Let’s take a look:

Configuration

enum values

When client sends a message

eConfigEnable

eConfigDisable

eConfigName

Old client + New server Client – 0 = eConfigEnable 1 = eConfigName Server – 0 = eConfigEnable 1 = eConfigDisable 2 = eConfigName No impact (value remains 0) No impact (client does not support eConfigDisable) Client will send ‘1’ Server will treat it as eConfigDisable
New client + Old server Client – 0 = eConfigEnable 1 = eConfigDisable 2 = eConfigName Server – 0 = eConfigEnable 1 = eConfigName No impact (value remains 0) Client will send ‘1’ Server was supposed to ignore unknown values (eConfigDisable) but will treat it as eConfigName POTENTIAL CRASH!! Client will send ‘2’ Server will ignore it (while it actually supports eConfigName)

The above table shows states where your client and server ‘speak’ different languages, different protocols. Client sends A but server treats it as B. The good case is that an operation will be ignored. The worst case (which is very likely to happen) is an unexpected behavior at server side which can easily lead to a crash. Just think of the simple situation where you are using a new client, an old server and client sends eConfigDisable. Server treats it as eConfigName (since both values are 1) and server access the pData pointer while actually the pointer’s value is NULL. In this simple example it may not be a big deal, but in cases where pData points to structure A in client and server will treat it as structure B, crashes are very likely to happen. Once again, just to clarify the most important point here – This invalid state of Client-Server protocol mismatch can happen only when they are not released to your customer together in 100% of the times. Different installers, patches or components updates can lead to this invalid state quite easily. Solution: How can we deal with such cases? How to avoid getting into this invalid state in products that are not tight together into one installer or are sometimes being patched/updated? Solutions that may help but won’t solve the issue:

  • Adding the last value of an enum “NUMBER_OF_WHATEVER”. This is just bad practice.
  • Hardcoding enum values.
  • Manual code reviews each time the protocol changes
  • Documentation of the enum/structure that must remain consistent. For example: like: “Don’t change values” / “Don’t touch here” / “Contact me for any change” / lots of exclamation marks / etc.

All of the solutions above may help a bit but still will not keep you safe of errors. The ideal solution will keep your code repository server free of such issues the whole time. In order to achieve this we need:

  • To be able to run a ‘diff’ between two protocol definition files and catch risky changes.
  • Automatic execution of this diff between code that is committed to your repository server and the existing code.
  • To block the commit upon error resulting from the ‘diff’.

The repository server I usually work with is TortoiseSVN, so I’ve created scripts in order to deal with such tricky cases. The idea is to deploy a pre-commit hook to SVN, so whenever a developer commits code to your SVN repository server, all protocol definition files will be checked for risky changes and if a risk was found, commit will fail. This way you keep your repository server safe in an automated procedure. Stay tuned for the scripts! Whether you are using this script or not, here are my guidelines for how to keep your client-server protocol consistent:

  • Protocol declarations (defined values, enums, structs) documentation should be very clear and with well-known high risk. Values, order and structure can be very risky
  • Use code analysis to ensure you keep your backward-forward compatibility
  • Take manual code reviews on high-risk definition files
  • Use automation

Automatic static code analysis before uploading your code

Developers and team leaders are probably familiar with static code analysis tools such as CppCheck, Klocwork and others. The main problem of the usage of such tools is that there is no enforcement. Meaning, you can upload your code to your code repository server with issues that could have found before the upload (while issues are already in repository server and even worse, deployed at your customers endpoints).

This means that the code repository server must be reviewed (using a static code analysis tool) once in a while in order to find and fix these issues (which requires a cycle of 2-3 engineers). So why not enforcing code to be analysed each time it is being uploaded?  This way you’ll save the engineers find-fix-test cycle and you ensure all of your issues will be fixed before releasing the product to the customers.

Check out this article I uploaded to the CppCheck community - CppCheck integration to TortoiseSVN (includes a script for static code analysis automation).

Since last changes with SourceForge, CppCheck data on SourceForge is missing. Therefore, I’m re-posting it here:

===================================================

Since we are not robots (yet), it is very possible to forget running a Cppcheck before committing code to the SVN server. Organizations that use Cppcheck (or any other static code analysis tool) usually perform the code analysis once a day/week/month. The team leader assigns a task to developer to fix the issue, commit the code and wait for the next code analysis. And then we start this cycle again. Sometimes the code analysis is taken after a build was already released to QA, or even worse, to customers.

So we all know that asking your developers to run the Cppcheck before every commit they do is not feasible. However, this process can be automated (and also invisible in some manner) for the developers.

Attached to this page a script which will automatically force the Cppcheck on all source files that are being committed. The check is run when the commit is triggered (before the commit is actually performed) with a zero effort from the developers. In the case issues are found, the script will fail the commit so the developer can fix the issues and commit only Cppcheck-checked code (failing the commit can be bypassed if needed). The great value of this approach is that we can fix the issues before they are committed to the SVN server!
Configuration

  1. Download SVN_Pre_Commit_Hook__CppCheck_Validate, extract the zipped file and edit the script:
    • cppCheckPath - Full path to your Cppcheck.exe (not CppcheckGui.exe).
    • supportedFileTypes - Add or remove file types to check. This variable is here so the script won’t check ‘.sln’, ‘.vxproj’ and other non-source file types.
    • enableScript - ’1′ or ’0′ to enable/disable running the script.
  2. Right click (somewhere on desktop) → TortoiseSVN → Settings → Hook Scripts → Add…
  3. Configure Hook Scripts:
    • Hook Type: Choose ‘Pre-Commit Hook’ (upper right corner).
    • Working Copy Path: The directory that all of your SVN checkouts are done. Use the top most directory (or just use ‘C:\’ for example).
    • Command Line To Execute: Full path to the attached script.
    • Make sure that both ‘Wait for the script to finish‘ and ‘Hide the script while running‘ checkboxes are checked → OK → OK.
    ConfigureHookScripts

Hints

  1. Even if the commit failed because it didn’t pass the static code analysis, SVN gives you the option to easily recommit disregarding the failure by clicking the ‘Retry without hooks‘ button. If commit succeeded (meaning, Cppcheck did not find any issues), it will look like nothing happened (so developers will still see a commit end message just like before).
    CommitScreen
  2. If you want to implement this solution in your organization/team you can do it in two different approaches:
    • Client side solution - Meaning, the steps above should be taken for all of your development machines. The benefit in this approach is that only relevant teams can use this solution and not all of the developers that are working on the SVN server. Besides, ignoring this Cppcheck (in case of false-positives for example) is quite easy using one button click integrated in the TortoiseSVN Client (‘Retry without hooks‘). This approach means that Cppcheck must be installed on all of the relevant developers machines of course.
    • Server side solution - Meaning, Cppcheck should be installed only on the SVN server and the steps above should be taken only once (server side only). So clients (developers’ machines) should take no action since every commit will trigger the hook at server side. The benefit is this is taken only once, but this solution may be to restrictive for some organizations. In addition, in order to ignore the hook (once again, false-positive for example) – you need to create some ‘back-door’ script that will allow developers to bypass it with a specific keyword in the commit message.
  3. More about SVN hook scripts - Client Hook ScriptsServer Hook Scripts.

All you need to do is take the Configuration steps above just once. Afterwards, you can work with SVN the same as before, just now you get to see your failures before code is committed to the SVN server.