New Fun Blog – Scott Bilas

Take what you want, and leave the rest (just like your salad bar).

Archive for the ‘programming’ Category

Assertive Finalizers

with 5 comments

Oruro

In my previous post, I talked about why I have stopped using finalizers for unmanaged resource collection. I want this to be done through the disposable pattern instead, forcing the programmer to manage resources manually.

Ironically, finalizers are a great way to verify this.

Sander van Rossen quickly figured out where I was going with this and proposed in a comment that we can just assert in a finalizer. We just need a couple more things:

  1. We need to track the source of the object to figure out where the leak is.
  2. We need to ensure that finalizers run on shutdown, or our assert will never get hit.

This is familiar territory to C++ programmers. Most of us use memory management libraries that provide leak detection and reporting. Let’s do something similar in C#.

DisposableBase

If you look around online, you’ll find some full-featured IDisposable base classes intended to deal with the full finalization model. We can eliminate most of that as either too complicated or unnecessary. We just need a few things:

  • A stack trace grabbed at construction time
    This will be used for the error report. Because this is expensive to gather, we need to control it with an #if that is off by default, only turned on if needed to help an investigation. Most of the time a leaked disposable will be easy to find by inspection, but in the 5% case we will need a lot more context.
  • A finalizer that reports the problem
    It could throw an exception, fire an assert, or route to an error reporter. Depends on the application. In my example I just have it output to the debug window for a demo.
  • Disposal helper methods
    These wrap up disposal a bit, so inheritors only need to implement OnDispose. The most important feature, though, is that Dispose will call GC.SuppressFinalize when our object is disposed. This eliminates the performance cost of having a finalizer in the normal case when clients are disposing this class properly. This is why the finalizer has no “if” in it – if it ever gets called, then we have a bug.

This is what I am currently using as my base class for handling unmanaged resources:

[code lang=”csharp”]
// comment out unless diagnosing a leak
#define DEBUG_DISPOSE

public abstract class DisposableBase : IDisposable
{
// store stack at point of construction for possible later use
# if DEBUG_DISPOSE
StackTrace _trace = new StackTrace(true);
# endif

// finalizer will not be called if object was properly disposed
~DisposableBase()
{
string message = "!! Forgot to dispose a " + GetType().FullName;
# if DEBUG_DISPOSE
message += "\n\nStack at construction:\n\n" + _trace + "!!";
# endif
Debug.WriteLine(message);
}

public bool IsDisposed { get; private set; }

public void Dispose()
{
ThrowIfDisposed();
IsDisposed = true;

try { OnDispose(); }
finally { GC.SuppressFinalize(this); }
}

protected abstract void OnDispose();

protected void ThrowIfDisposed()
{
if (IsDisposed)
throw new ObjectDisposedException(GetType().FullName);
}
}
[/code]

Demonstration

Here is a simple test app that shows what happens if we forget to dispose an instance.

[code lang=”csharp”]
public class DatabaseConnection : DisposableBase
{
protected override void OnDispose()
{ Debug.WriteLine("Disposing"); }
}

public class Program
{
static void Main(string[] args)
{
using (var remembered0 = new DatabaseConnection())
using (var remembered1 = new DatabaseConnection())
{
}

var forgotten = new DatabaseConnection();

GC.Collect();
GC.WaitForPendingFinalizers();
}
}
[/code]

The first two instances dispose fine, but the third is leaked and so we get a log to the output window:

[code lang=”text”]
Disposing
Disposing
!! Forgot to dispose a DatabaseConnection

Stack at construction:

at DisposableBase..ctor() in C:\Users\Scott\Cloud\Proj\tests\BlogSamples\ConsoleApplication1\Finalizers.cs:line 18
at DatabaseConnection..ctor()
at Program.Main(String[] args) in C:\Users\Scott\Cloud\Proj\tests\BlogSamples\ConsoleApplication1\Finalizers.cs:line 89
at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
!!
[/code]

We can easily zero in on the exact spot where the leaked resource was allocated and wrap it in a ‘using’ to resolve.

Note the use of GC.Collect and GC.WaitForPendingFinalizers right before the test application exits. This is necessary in order to force all finalizing objects to be collected and reported before process shutdown, otherwise they are simply dropped by the system when the process’s memory is released.

These two can even be called during normal app run as well for leak testing without needing to wait for shutdown. This would be useful in a live service with a periodic leak for which we want an in-session nonfatal log of leaks.

What About External Classes?

This takes care of our own classes, where we have full control. But what about system or third party classes that are using finalizers? With those it’s back to square one.

Well, I suppose would write a helper class..

[code lang=”csharp”]
public class SafeDisposer<T> : DisposableBase where T : IDisposable
{
public SafeDisposer(T disposable) { Obj = disposable; }

public T Obj { get; private set; }

protected override void OnDispose()
{
Obj.Dispose();
Obj = default(T);
}
}

public static class SafeDisposer
{
public static SafeDisposer<T> Wrap<T>(T disposable) where T : IDisposable
{ return new SafeDisposer<T>(disposable); }
}
[/code]

And equivalent updates in the demo code:

[code lang=”csharp”]
public class Program
{
static void Main(string[] args)
{
using (var remembered2 = SafeDisposer.Wrap(new StringReader("foo")))
using (var remembered3 = SafeDisposer.Wrap(new StringReader("poo")))
{
remembered2.Obj.Peek();
}

var forgotten2 = SafeDisposer.Wrap(new StringReader("boo"));

GC.Collect();
GC.WaitForPendingFinalizers();
}
}
[/code]

And output:

[code lang=”text”]
!! Forgot to dispose a SafeDisposer`1[[System.IO.StringReader, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]

Stack at construction:

at DisposableBase..ctor() in C:\Users\Scott\Cloud\Proj\tests\BlogSamples\ConsoleApplication1\Finalizers.cs:line 18
at SafeDisposer`1..ctor(T disposable) in C:\Users\Scott\Cloud\Proj\tests\BlogSamples\ConsoleApplication1\Finalizers.cs:line 53
at SafeDisposer.Wrap[T](T disposable) in C:\Users\Scott\Cloud\Proj\tests\BlogSamples\ConsoleApplication1\Finalizers.cs:line 67
at Program.Main(String[] args) in C:\Users\Scott\Cloud\Proj\tests\BlogSamples\ConsoleApplication1\Finalizers.cs:line 98
at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
!!
[/code]

So to use, we just wrap a disposable at the point of creation in SafeDisposer.Wrap() and access it through “.Obj”. Not bad, but not great either. I’m ok with the wrapper function, but the required access via the Obj member is a pain, and also means that converting an unsafe disposable into a safe one requires updating a lot of code.

Another option is (maybe) to use a dynamic proxy system to inject the functionality we need, letting client code remain unchanged except at the point of creation. Or perhaps a run-time system that patches system assemblies to do the injection.

I’ll leave this as an exercise for the reader because I think we’re probably getting seriously diminishing returns at this point. Most disposable objects in a large system will be classes that we have full control over. The few objects not under our control will likely be low level primitives that will be wrapped up by our own foundation classes anyway.

Written by Scott

March 8th, 2010 at 6:04 pm

Posted in .net,programming

Finalizers: An Incomplete Pattern

with 2 comments

Whistler When you were first learning about how .NET finalization works, did it just feel wrong to you? It sure did for me. The mechanisms and rules involved with finalizers always felt painfully over-complicated, hard to get right, and hacky. Here we have this clean managed-memory paradigm that feels great to use. And it’s got a big gnarly barnacle named Finalize growing out of its side that we’re supposed to use to deal with unmanaged resources.

I learned .NET when the beta of 1.0 was out, and at the time Microsoft was putting a lot of effort into educating programmers the difference between native and managed programming to make the switch. “In C++ you’d do this, but in .NET you do this” kind of thing. So practically every article on the CLR talked about finalizers. They often made analogies to C++ destructors. C# even added some ill-advised sugar that converts ~Class() to Finalize() to make us grognards feel more at home.

Finalizers were obviously very important, and so we learned about them. We evolved base classes to help us do the boilerplate work and wished that the .NET languages supported mixins. But it never felt right to me.

The Problem With Finalizers

Recently, after a couple conversations at work, I figured out what’s been bothering me about finalizers. The problem is that we’re using a commodity manager to manage non-commodity resources.

In .NET, raw memory can be treated as a commodity, like a gas tank. We can use it fluidly and at any granularity, and treat it all roughly the same. This is a simplification of course, and there are performance considerations, but they closely map onto the underlying OS and are familiar to everyone.

Yet non-memory resources are not commodities and cannot be treated with a one-size-fits-all pattern. They have semantics and effects far beyond incrementally draining and refilling that gas tank. Each situation, each class, is different, and has different implications that we have to remember. Because of this, we can’t treat these resources in a nondeterministic way without potential hazards.

For example, take a file handle managed by a finalizer. Off the top of my head, there are three adverse effects of an unused handle not finalizing in time:

  1. If there are pending writes to the file, then they are lost if the app exits without forcing finalizers to be called.
  2. If the file was opened with restrictive sharing, then other applications cannot manipulate it until the GC gets around to finalizing the file object.
  3. The OS isn’t designed to treat file handles as a commodity, and using them that way can negatively affect performance. We could even run out of file handles because the GC has no idea when there’s pressure in this space (all finalizers are equal).

Worse yet, in diagnosing any of the above issues, we also must roll our own tools. System tools such as LockHunter and Process Explorer have no way of distinguishing which handles are actually in use and will just give us a noisy, useless mess.

And that’s just a simple file handle example. The situation is obviously worse as more limited and complicated resources are involved like DirectX surfaces or database connections.

Dispose: Only A Partial Solution

You might wonder why I’m making a fuss. There’s an obvious answer to the non-determinism, right? Microsoft recognized this problem early in in .NET 1.0 and gave us the disposable pattern. It is a standard way of managing resources the old-fashioned way: ‘new’ allocates the resource, and Dispose() frees it. We even have some special syntax that helps automate this:

[code lang=”csharp”]
using (var textureMgr = new TextureManager())
using (var texture = textureMgr.AllocTexture())
{
texture.Fill(Color.White);
}
[/code]

The above is roughly equivalent to:

[code lang=”csharp”]
var textureMgr = new TextureManager();
try
{
texture = textureMgr.AllocTexture();
try
{
texture.Fill(Color.White);
}
finally
{
texture.Dispose();
}
}
finally
{
textureMgr.Dispose();
}
[/code]

This gives us a rough, though somewhat tedious approximation of the RAII pattern used universally in C++.

In Microsoft’s .NET classes that wrap unmanaged resources, both patterns are typically used. There is a Dispose system that closes the handle, and a safety Finalize that cleans up if not already done already via Dispose.

So what’s the big deal? Well, the problem is in (a) knowing when it is necessary to call Dispose, (b) remembering to call it, and (c) updating dependent code when an existing type adds a new IDisposable implementation. It’s easy to miss something and let bugs creep in. For every single ‘new’ call, we must check the type to see if it or one of its parent classes implements IDisposable. If it does, then we must manage the instance directly. This means wrapping allocation in the ‘using’ construct for local temporaries, or implementing IDisposable and forwarding Dispose if contained as a member in another class.

Forgetting to dispose an unmanaged resource can lead to some of the most frustrating and difficult to track down bugs. And the nondeterminism in the underlying system pretty much squares that problem. It’s guaranteed to behave differently in the field than on a development machine.

Every new .NET programmer figures this out quickly when they write their first command line app that opens a file, does something to it, and writes out a new file. They find that, apparently randomly, sometimes the end of the new file is cut off. Forgot to call Dispose to flush and close it eh?

Most Strategies Not 100% Guaranteed

It’s not hopeless. We’ve come up with strategies to deal with this. First is knowledge base. Over time, we get a sense for what types of objects tend to implement IDisposable. File handles, database connections and so on, those are easy. Though what about systems written by someone else on your team? Does the source control connection have to get disposed when you’re done with it because it wraps a COM object? For those cases, it’s better to be safe and check for IDisposable until the foreign classes become familiar.

There are also tools that help out. FxCop can find problems that are discoverable via static analysis. CodeRush will draw graphical markup for types it detects as IDisposable.

But these strategies still aren’t enough for me. I want absolute certainty that Dispose is called on all unmanaged resources, and not left to the nondeterministic finalizer system to cause latent timing problems.

So I want to propose an additional strategy. It’s pretty simple, actually: consider the finalizer to be a last, and illegal resort. Microsoft’s guidelines say that you should not assume Dispose has been called. I propose the reverse: if the finalizer is called, then there is a bug in the code.

More on this in the next post.

Written by Scott

March 1st, 2010 at 8:45 pm

Posted in .net,programming

Constants: They Go On The Right

without comments

All you people who insist on putting constants on the left side of expressions are living in the dark past. It’s time to change your ways.

It’s more than just an old habit made pointless by modern compilers, it’s actually a bad idea as well. It reduces readability by going against all of the left-to-right conventions that are hard wired into every other part of our brains.

It just makes you code like an old man.

Let’s Code Like An Old Man

What I’m talking about typically looks like this:

if (SOME_CONSTANT_NAME == someVariable)
{
    // do some stuff
}

That’s a constant over there on the left side of the test expression.

I forget where I first saw this recommended. Perhaps it was Code Complete. It’s pretty old, though even today it still comes highly recommended in books on coding styles. Books usually written by old folks. Perhaps you work somewhere with a coding standard maintained by geezers that requires this.

But in the old days, it was necessary! Constant-on-the-left avoids this mistake:

if (someVariable = SOME_CONSTANT_NAME)
{
    // uh oh...
}

That’s the famous missing ‘=’. Bane of beginner programmers everywhere who never saw more than one equals sign in a row in any math class they ever had. Not to mention sleepy programmers trying to get one last bug fix done before they crash.

To the compiler, leaving off the second equals sign translates to this, more or less:

{
    someVariable = SOME_CONSTANT_NAME;
    if (someVariable)
    {
        // hmm...
    }
}

The basic problem is that assignments are expressions, the value of which is the lvalue (here, the left side of the assignment).

So to avoid this problem, people started putting the constant on the left side. Because it’s a constant, an accidental (or purposeful!) assignment will fail and get caught at compile-time. Awesome.

But It’s No Longer Necessary

I say this is an old man coding style because I typically only see it with more experienced people who were around the time of Code Complete coming out and picked up a good habit. Kids coming out of school rarely do it on their own. They go into their first job, run into an old man and get told to do it this way. Then when they get old they teach it to the new kids. It’s a never-ending cycle!

Well, as of five years ago at least, it’s no longer necessary.

Most of the new languages we use today, like C# or Lua or Java or Ruby or INTERCAL don’t have this problem. Some don’t even offer a ‘==’ operator to screw up with. Some offer ‘===’!

And the older languages, well at least C/C++, have compilers that figure out 99% of the time that you screwed up and warn you about it. For the example above, Visual C++ dictates in its robot-speak:

warning C4706: assignment within conditional expression

(Note that this is assuming you have warning levels turned up. Anyone running their compiler at anything less than the maximum warning level is just asking for bugs.)

So, in 2009, you get the benefits that you had in 1995. The compiler is looking out for you, once again. You just don’t have to put the constant on the left any more.

But, hey, you’re used to this old habit and why not keep it up? It’s just a different style, right? And it’s certainly not a bad thing to keep doing.

Oh yes it is!

Yes, It’s Actually A Bad Habit

Here’s why we should all stop doing this. Well, besides the fact that it’s pointless in 2009. It all comes down to this:

We Read Left To Right

Engineers are left-to-right people. We really have to go out of our way to not do things this way. (Well, except Israelis. Israelis: put your constants on the left.)

In math class, x = 10 or y < 5. Equations flow left to right. People have to be taught not to do it, just so they can selectively pick up this old constant-on-the-left habit.

What’s on the left is more important than what is on the right. It comes first as we’re scanning text. Everything is aligned there. It’s where the screen begins. Resize that window to make it more narrow and what disappears? Not the left side. Put that namespace-qualified long constant name on the left and that’s all you’ll see.

So which is more important? The fact that you’re comparing NULL to something, or that something is being tested against NULL? That ‘something’ is doing program flow control. It is the pivot point. It is the decision maker. It holds the state. It’s easily the more important part in the expression. If you have to work to get to see it, then it’s not as readable.

Therefore, the lesser important component – the constant – should go on the right.

Well now, I’m glad to get that off my chest.

Written by Scott

May 10th, 2009 at 8:14 pm

Posted in c++,programming,style