Friday, April 24, 2020

Benchmarking in .NET

I had a recent chance to code review a hot fix for our platform at work. One particular changeset caught my attention as it had a funny code smell. The method was removed in the final changeset due to a requirements change, but I still wanted to understand the ramifications of that code and see if there was something there. So, let's do some benchmarking...

Using BenchmarkDotNet

I had been introduced to this package last year, and was impressed after using it to profile some new code my team had written. BenchmarkDotNet is an open source profiling framework which allows you to build performance measuring projects into your solutions very easily. You simply add a new project, pull in the NuGet package, and start writing tests. A neat feature of this package is that it allows you to also get memory diagnostics, so you can understand the GC footprint of your code.

Name that Smell

Here is method that piqued my curiosity.


public IList<ResultMessage> CheckLoadFederalStimulusRules(int code, AccountStatusReasonCode reason)
{
    var allowedLockedReasons = new List<accountstatusreasoncode>
    {
        AccountStatusReasonCode.PASSWORD_ATTEMPTS,
        AccountStatusReasonCode.PIN_ATTEMPTS,
        AccountStatusReasonCode.UNLOCK_ATTEMPTS
    };
    var messages = new List<ResultMessage>();
    switch (code)
    {
        case (int)AccountStatusCode.Closed:
        case (int)AccountStatusCode.PermanentlyClosed:
            messages.Add(new ResultMessage(ResultStatus.Fail, AccountStatusCode.Closed));
            break;
        case (int)AccountStatusCode.Locked:
            if (!allowedLockedReasons.Contains(reason))
            {
                messages.Add(new ResultMessage(ResultStatus.Fail, AccountStatusCode.Locked));
            }
            break;
    }
    return messages;
}

This is a really simple method. It creates a list of allowable reasons, and then verifies if the operation is allowed. Knowing that the method will get called a lot, the memory allocations just didn't sit right. So I dig in to see if I can make it faster.

Build the Benchmark Harness

Typically you would want to build this right into your solution, but I use Visual Studio for Mac at home and this solution doesn't play nice without IIS, so I go an alternate route and setup a new .NET Core console app.

public class Program
{
    TestClass test = new TestClass();

    static void Main(string[] args)
    {
        Console.WriteLine(@"Let's test some code!");
        var summary = BenchmarkRunner.Run<Program>();
    }

    [Benchmark]
    [Arguments(0, AccountStatusReasonCode.NONE)]
    [Arguments(0, AccountStatusReasonCode.UNLOCK_ATTEMPTS)]
    [Arguments(1, AccountStatusReasonCode.UNLOCK_ATTEMPTS)]
    [Arguments(1, AccountStatusReasonCode.NONE)]
    [Arguments(3, AccountStatusReasonCode.UNLOCK_ATTEMPTS)]
    [Arguments(3, AccountStatusReasonCode.NONE)]
    public void TestOriginal(int code, AccountStatusReasonCode reason)
    {
        var messages = test.CheckLoadFederalStimulusRules(code, reason);
    }
}

The first thing I do is tear the method out of it's home, throw it into the test harness, and setup a benchmark to exercise the various code paths. The attributes drive the benchmarks and allow the developer to control how any particular method is tested. I wanted each code path evaluated to make sure I get a full idea of how the method behaves. Running the application kicks off the profiling process and dumps the diagnostics out to the console. Since I don't have anything to compare it to, I run through some simple optimizations just to see how much better we can get.

Let's Tune Some Code

I wrote three different versions of the method to get a feel for how each optimization affects the performance. I'll throw the code for all of them at the bottom of the page.

UpdatedCheckLoadFederalStimulusRules
I moved the List<AccountStatusReasonCode> object to a static class variable, so we only instantiate and initialize the list once.

OptimizedCheckLoadFederalStimulusRules
This version was a lot more aggressive. I noticed I could take advantage of the calling class where it treats a null response object as a success condition. Next, I unwound the switch block and turned it into if/then statements.

FinalCheckLoadFederalStimulusRules
This version also uses the null return to prevent creating an empty response list for success conditions. It also happens to be much more readable than the Optimized version above.

Benchmarking & Results

BenchmarkDotNet produces a bunch of statistical analysis graphs and files in the application folder, but I am really just interested in the console output for my purposes. The output breaks down each method and each test case and shows how it performed under load.


The results were actually a bit surprising and show just how expensive memory allocations are. For the fastest path, the final version produced a 91% decrease in runtime with no memory allocations. For the slowest path, we still produced a 26% decrease in runtime, and cut the amount of garbage collection almost in half.

The difference between the Optimized version and the Final version showed that they are almost equivalent, but to have the code be much more readable and maintainable is worth the tiny performance hit in the fastest code paths. For a little more insight into the underlying difference between these two methods, refer to this article on the differences between if and switch at the IL level.

Final Thoughts and Code

This simple exercise just goes to show how small items can eat into the performance of your application, and also how expensive memory allocations are. With just a little more care and attention by the original coder, we could have had a version that performed much better under load. The hope is that you can utilize this package to test the code you are writing and experiment with various constructs to better understand how to optimize your application and identify performance problems before your code hits the field.

Here is the code file in it's final tested form.

using System;
using System.Collections.Generic;
using Serve.Internal.Shared.DataContracts;

namespace Apr2020
{
    public class TestClass
    {
        static List<AccountStatusReasonCode> AllowedLockReasons;

        #region Constructors
        public TestClass()
        {
        }

        static TestClass()
        {
            AllowedLockReasons = new List<AccountStatusReasonCode>
            {
                AccountStatusReasonCode.PASSWORD_ATTEMPTS,
                AccountStatusReasonCode.PIN_ATTEMPTS,
                AccountStatusReasonCode.UNLOCK_ATTEMPTS
            };
        }
        #endregion

        // This is the original version of the method
        public IList<ResultMessage> CheckLoadFederalStimulusRules(int code, AccountStatusReasonCode reason)
        {
            var allowedLockedReasons = new List<AccountStatusReasonCode>
            {
                AccountStatusReasonCode.PASSWORD_ATTEMPTS,
                AccountStatusReasonCode.PIN_ATTEMPTS,
                AccountStatusReasonCode.UNLOCK_ATTEMPTS
            };
            var messages = new List<ResultMessage>();
            switch (code)
            {
                case (int)AccountStatusCode.Closed:
                case (int)AccountStatusCode.PermanentlyClosed:
                    messages.Add(new ResultMessage(ResultStatus.Fail, AccountStatusCode.Closed));
                    break;
                case (int)AccountStatusCode.Locked:
                    if (!allowedLockedReasons.Contains(reason))
                    {
                        messages.Add(new ResultMessage(ResultStatus.Fail, AccountStatusCode.Locked));
                    }
                    break;
            }
            return messages;
        }

        // We use the static collection for checking lock reasons and lessen memory allocations
        public IList<ResultMessage> UpdatedCheckLoadFederalStimulusRules(int code, AccountStatusReasonCode reason)
        {
            var messages = new List<ResultMessage>(1);
            switch (code)
            {
                case (int)AccountStatusCode.Closed:
                case (int)AccountStatusCode.PermanentlyClosed:
                    messages.Add(new ResultMessage(ResultStatus.Fail, AccountStatusCode.Closed));
                    break;
                case (int)AccountStatusCode.Locked:
                    if (!TestClass.AllowedLockReasons.Contains(reason))
                    {
                        messages.Add(new ResultMessage(ResultStatus.Fail, AccountStatusCode.Locked));
                    }
                    break;
            }
            return messages;
        }

        // Unwind the case statement and try and take advantage of the null
        // return handling here to shortcut processing
        // Less readable, but very optimized
        public IList<ResultMessage> OptimizedCheckLoadFederalStimulusRules(int code, AccountStatusReasonCode reason)
        {
            if ((int)AccountStatusCode.Open == code)
            {
                return null;
            }
            if ((int)AccountStatusCode.Locked == code)
            {
                if (!TestClass.AllowedLockReasons.Contains(reason))
                {
                    var messages = new List<ResultMessage>(1);
                    messages.Add(new ResultMessage(ResultStatus.Fail, AccountStatusCode.Locked));
                    return messages;
                }
            }
            else if ((int)AccountStatusCode.Closed == code || (int)AccountStatusCode.PermanentlyClosed == code)
            {
                var messages = new List<ResultMessage>(1);
                messages.Add(new ResultMessage(ResultStatus.Fail, AccountStatusCode.Closed));
                return messages;
            }
            return null;
        }

        // Back to the Switch construct, optimized for readability and speed
        public IList<ResultMessage> FinalCheckLoadFederalStimulusRules(int code, AccountStatusReasonCode reason)
        {
            switch (code)
            {
                case (int)AccountStatusCode.Closed:
                case (int)AccountStatusCode.PermanentlyClosed:
                    {
                        var result = new List<ResultMessage>(1);
                        result.Add(new ResultMessage(ResultStatus.Fail, AccountStatusCode.Closed));
                        return result;
                    }
                case (int)AccountStatusCode.Locked:
                    {
                        if (!TestClass.AllowedLockReasons.Contains(reason))
                        {
                            var result = new List<ResultMessage>(1);
                            result.Add(new ResultMessage(ResultStatus.Fail, AccountStatusCode.Locked));
                            return result;
                        }
                        return null;
                    }
                default:
                    return null;
            }
        }
    }
}



Saturday, July 29, 2017

Meditation: How can doing nothing be so hard?

My first attempt at meditation had everything going for it; time, environment, and mindset. What happens next just goes to show just how difficult it is to do nothing, which should be the easiest thing in the world. Picture this, sitting outside on a nice soft mat, next to a pool, on a beautiful summer morning. You're thinking, a simple meditation session will be an easy and relaxing way start to your upcoming hectic day. I mean, how hard could it be?

Well, karma's a bitch and she showed me in spades what a noob I am in terms of the meditational arts, and why doing nothing can be difficult.


But first a little background on how I ended up here. This whole personal growth journey I started just felt very necessary, as I needed something grounding to help balance the chaos that has crept into my life. Given the number of balls that I've been juggling lately, I just felt an utter lack of focus, which leads to being physically and emotionally overwhelmed at times. After listening to a couple of rather profound podcasts, I latched onto the idea of meditation as something which could add value to my life with just a small time investment. Besides, it's practically the same as doing nothing for a short period of time, easy bonus points!

So back to my first foray into meditation. The stage was set. I had my spot next to the pool, I was feeling relaxed and, dare I say, a bit Zen. My phone was set to make sure I didn't spend too much time in relaxation mode, not knowing that I'd end up turning it off early. The first minutes went by easily as I focused on my breathing, hearing nothing but the wind and birds quietly in the background. I could feel myself relaxing, and melting into the moment. Then the first sign of trouble appeared, a dull ache in my lower back.

So, if you haven't had the pleasure of working behind a desk for twenty plus years, then you've really missed out on screwing up your body in amazing ways. It seems that one thing they don't teach you in school is that prolonged sitting causes your hamstrings to shorten ever so slightly, especially when done over a period of years. This shortening will pull your hips slightly out of rotational alignment at times when you need them to be straight. Times like standing for long periods, or sitting on the floor, legs crossed, attempting to force a good postural position on a system of muscles and ligaments that just don't want to play along for more than a few minutes.


The dull ache starts to grow into a slow burn, as I struggle to sit up straight. I make a mental note that I need to do some soft tissue work in addition to hitting the weights at the gym. My focus is now wavering, but I still feel like I can continue. Then I notice the slightly obnoxious hum of the neighbor's lawn mower.  This typically wouldn't be much of an issue, but I made an association with the previous night as the same neighbor behind me had been mowing then as well. I was taking a swim in the pool and just chilling when the mower started up, chewing up branches and detritus in his lawn for what seemed like forever. He had a good sized lawn, but was using a tiny push mower, and being an older retiree, moved at a more leisurely pace. Let's just say I was annoyed and triggered, and ended up going inside to escape the auditory assault. So here he is the next morning, bright and early, finishing the job.

The pain in my back is intensifying as I find myself trying to shift around into any position to attempt to alleviate it, and it digs into what little focus I had left, which is now under assault by the sound of sticks and grass being chopped into little bits... slowly...

Still I'm thinking to myself ... possibly still salvageable. But like I said, karma is a bitch and she wasn't done making her point yet. The next thing I hear is from just inside the house, from the window, with both my wife and daughter audibly snickering. Whatever focus I had is now gone as I watch the ripples in the pool and attempt to shut out the world, who is so rudely tapping on my little bubble. The next thing I know, out pops the family, with comments like, "Are you doing yoga?" "You have really bad posture?" "Are we bugging you?" "I took some photos of him!"  I just love being their little amusement side show, and at that point I was just done, and only seven minutes in. I shake my head, as I gather my things before it get covered in sunscreen as my wife sprays down my daughter before she heads to the beach. The look of failure evident on my face...

I now understand just how far I have to go in this journey. Meditation isn't about controlling your thoughts, but about observing them with a relaxed and focused mind. Still not sure I can come to find my inner peace when it comes to a lawnmower, but I'll certainly give it a shot.


Friday, June 30, 2017

Unit Testing for Legacy C/C++

I recently had the opportunity to work with some really old C/C++ code to try and uncover a heap corruption bug. Not wanting to recreate the legacy build environment, with its equally old compiler, I made the decision to pull the questionable code onto my laptop to get access to some modern tooling. I also needed to get some tests around this library to verify its correctness and that's when I stumbled across a portable, header based, unit testing library called Catch that was exactly what I needed.
The beauty of Catch is in its simplicity. Open your favorite editor, add a couple of includes, write a test, compile and run. When you execute the application, it runs all of your tests and reports the results. The following is the compilation and execution with a test failure.

john$ clang -g -Wall -fprofile-instr-generate -fcoverage-mapping -lstdc++ -o testhost tests.cpp irisbuffer.cpp nullterm.cpp

john$ ./testhost

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
testhost is a Catch v1.9.4 host application.
Run with -? for options

-------------------------------------------------------------------------------
resize
  Shrink Buffer, Expect bytes cleared
-------------------------------------------------------------------------------
tests.cpp:251
...............................................................................

tests.cpp:264: FAILED:
  REQUIRE( buffer.getBytesUsed() == expectedLen )
with expansion:
  3 == 4

===============================================================================
test cases:  15 14 passed | 1 failed
assertions: 132 | 131 passed | 1 failed


Writing a Test in Catch


The test constructs are simple, yet powerful. Each test is denoted by a TEST_CASE macro which contains the name as well as support for some contextual tags. At this point you can write your standard Arrange, Act, Assert unit test and you are ready to run. One of the features I really like is sections. These allow you to organize your unit tests into more meaningful functional groups. The following is an example test which shows the basic test structures.

TEST_CASE( "resize", "[CIrisBuffer]" )
{
    char actual[] = "1234567890";
    size_t actualLen = sizeof(actual) - 1;
    
    SECTION("Shrink Buffer, Expect bytes cleared")
    {
        char expected[] = "123";
        size_t expectedLen = sizeof(expected) - 1;
        char expectedMem[] = "123\0\0\0\0\0\0\0";
        size_t expectedMemLen = sizeof(expectedMem) - 1;
        CIrisBuffer buffer(actual, actualLen);
        size_t expectedCapacity = buffer.getCapacity();
        char* expectedBuffer = buffer.getBuffer();
        
        buffer.resize(3);
        
        REQUIRE(buffer.getCapacity() == expectedCapacity);
        REQUIRE(buffer.getBytesUsed() == expectedLen);
        REQUIRE(0 == memcmp(buffer.getBuffer(), expected, expectedLen));
        REQUIRE(0 == memcmp(buffer.getBuffer(), expectedMem, expectedMemLen));
        REQUIRE(buffer.getBuffer() == expectedBuffer);
    }
    
    SECTION("Grow with fill, Expect fill with nulls")
    {
        char expected[] = "1234567890\0\0\0\0\0\0\0\0\0\0";
        size_t expectedLen = sizeof(expected) - 1;
        CIrisBuffer buffer(actual, actualLen);
        size_t expectedBytesUsed = buffer.getBytesUsed();
        char* oldBuffer = buffer.getBuffer();
        
        buffer.resize(expectedLen);
        
        REQUIRE(buffer.getCapacity() > expectedLen);
        REQUIRE(buffer.getBytesUsed() == expectedBytesUsed);
        REQUIRE(0 == memcmp(buffer.getBuffer(), expected, expectedLen));
        REQUIRE_FALSE(buffer.getBuffer() == oldBuffer);
    }
}

What about code coverage?


So the tests I wrote exposed the memory bug pretty quickly, but how could I be sure that was the only issue? I needed to know how much of the code I had covered with my tests. Fortunately, there is some great tooling support which makes this really easy to integrate into your test execution. The LLVM toolchain contains several utilities which make generating the coverage report a snap. The key is telling the compiler to generate profile and coverage mapping information when the code is executed. Here is an example using the clang compiler.

clang -g -fprofile-instr-generate -fcoverage-mapping -lstdc++ -o testhost irisbuffer.cpp nullterm.cpp tests.cpp

With these compiler options, when the application is executed a default.profraw file is generated which contains all the profile information from the run. The data file by itself is pretty worthless, but run it through a couple of utilities and you get the desired magic output. In order to get the raw profile data into a form that the coverage tool can use, it has to be transformed into an indexed profile data file. The llvm-profdata tool does this for you when using the merge command.

llvm-profdata merge -o testhost.profdata default.profraw

Now the fun begins. llvm-cov is used to visualize the coverage information. It has two modes, a coverage report mode and a source code overlay show mode. The report command reveals what your coverage is for each source file.

llvm-cov report ./testhost -instr-profile=testhost.profdata

Code Coverage Report

As you can see, it included the coverage of the tests themselves. These can be excluded with some compiler magic by excluding the coverage flags for those files. If you want a more granular report you can specify one or more filenames on the command line and it will spit out the function level coverage. Seeing this data at the file level is great for high level reports, but we need to see it at the line level to see what we missed. That's where the show command comes in handy.

llvm-cov show ./testhost -instr-profile=testhost.profdata

  218|       |bool CIrisBuffer::clear(void)
  219|     58|{
  220|     58|    usedBytes = 0;
  221|     58|    if (xbuffer)
  222|     58|    {
  223|     58|        if (!ZeroMemory(xbuffer, capacity))
  224|      0|        {
  225|      0|            error = BUFFER_WIN32_ERROR;
  226|      0|            supplementalError = GetLastError();
  227|      0|            logErr(LOGERROR, "ZeroMemory failed.", GetLastError(), LOCATION);
  228|      0|        }
  229|     58|    }
  230|     58|    return true;
  231|     58|}

Now that's what I'm talking about!! I can see that I am only missing coverage of some negative cases, which frankly, I can live without due to the difficulty in forcing them to fail. What's also nice is that you can see the execution counts for each LOC. I like this as it tells me which parts of the code are hot and could possibly benefit from optimization.

Summary


Honestly, I wish I had found Catch years ago. The simplicity and ease of use make it stupid simple to write and run tests for new code, or to integrate it into older code with a little massaging. So if you find yourself digging through some legacy codebase wondering if it was ever fully tested, throw Catch into the mix and you might be surprised with what you uncover.