MyPublisher

I’m looking into creating a photobook. Since my daughter was born five years ago*, I’ve been making photo albums the old fashioned way, but this is really time consuming and since I do a lot of design work in photoshop, I’ve gotten used to having the kind of control that you just can’t get with scissors. There are literally dozens of sites that do this, but at the moment I’m leaning toward MyPublisher. (Their stock just went up as all zero of you rushed to their site to buy something.) But I have to relate an almost comical moment of bad customer relations. When you go to the site, you’re greeted by a really cheerful offer to “Get a free photobook”. But if you look at the Terms and Conditions, you find (amidst lots of other similar text):

During checkout, under quantity in the MyPublisher shopping cart, enter the quantity you desire. You will be charged for one less book than you enter. (Example: If you want to receive 5 books, put 5 in quantity. You will be charged for only 4.) For the Coupon code to work, the quantity ordered must be at least 2.

Ahh, so they don’t exactly mean “free”, they mean buy one get one free. And since that second one has to be identical to the first, it’s of limited use. I mean, what am I going to do with two copies of my family album for 2007? I’m pretty sure no one else in the world would want one. I guess I could keep it for backup. But I may just save a tree and just not take it at all.

If they wanted to say “two for the price of one” or “buy one get one free”, that would be fine. But their strident offer on the home page combined with the circuitous text in the fine print comes off as icky.

*Before that, I never had the urge to do any such thing, and didn’t even own a camera. There are a handful of pictures that other people took, but they’re on paper in a drawer somewhere. An archaeologist of the future will think my wife and I were born when our daughter was.

Modifying objects in place

It can be confusing to people new to .NET that most objects do not actually change when you do something to them:

        ‘make a date

        Dim dSomeDate = Now



        
‘a timespan for 6 hours

        Dim ts = TimeSpan.FromHours(6)



        
‘add 6 hours  <– but dSomeDate is unchanged!

        dSomeDate.Add(ts)



        
‘we meant this

        dSomeDate = dSomeDate.Add(ts)

Once you get used to this, it’s perfectly intuitive, and it’s arguably best to make your own objects work this way too, just so that .NET people won’t be surprised. But the other day I got a little tired of the thing = thing.ChangeSomething() style, in part because there’s no IDE support for it (yes, I suppose I could use snippets, but it didn’t feel right for this). So I coded up this:

<Extension()> _

Sub AddInPlace(ByRef dDate As DateTime, ByVal ts As TimeSpan)

    dDate = dDate.Add(ts)

End Sub

Note that ByRef, by the way, without which it wouldn’t work. And now I can write:

        ‘make a date

        Dim dSomeDate = Now



        
‘a timespan for 6 hours

        Dim ts = TimeSpan.FromHours(6)



        
‘add 6 hours  <– dSomeDate IS changed

        dSomeDate.AddInPlace(ts)

I’m not sure if I like the precedent this establishes, but I’m trying it out.

MatchAsDictionary

A few posts ago, I mentioned my MatchAsDictionary method, so I thought I’d post the code in case you might find it useful:

    <System.Runtime.CompilerServices.Extension()> _

    Function MatchAsDictionary(ByVal s$, ByVal rgx As Regex) As StringDictionary

        
‘create the dictionary

        Dim dict As New StringDictionary



        
‘do the match

        Dim m = rgx.Match(s)



        
‘if no match, done

        If Not m.Success Then Return dict



        
‘for each name defined in the regex

        For Each sName In rgx.GetGroupNames

            
‘get the corresponding value and put in the dictionary

            dict(sName) = m.GroupValue(sName)

        
Next



        ‘return

        Return dict

    
End Function

I also have a version that takes a string pattern instead of a Regex, for when that’s more convenient:

<System.Runtime.CompilerServices.Extension()> _

Function MatchAsDictionary(ByVal s$, ByVal sPattern$, Optional ByVal options As RegexOptions = RegexOptions.None) As StringDictionary

    
‘create the regex

    Dim rgx As New Regex(sPattern)

    
‘delegate to that function

    Return s.MatchAsDictionary(rgx)

End Function

Basically, I use this when I want to parse a string using a regex, and then use the named captures for something:

Dim sPattern = "(?’type’\w+):(?’pattern’.+)"

Dim sInput = "SomeType:SomePattern"

Dim d = sInput.MatchAsDictionary(sPattern)

DoSomethingWith(d!type)

Finally, that StringDictionary that it returns is a simple class that I made to work like a Dictionary with String keys and values, and I made the keys case insensitive:

    Public Class StringDictionary

        
Private mInner As New Dictionary(Of String, String)



        
Default Property Item(ByVal sName$) As String

            Get

                Return mInner.ItemOrDefault(sName.ToLower)

            
End Get

            Set(ByVal value As String)

                mInner(sName.ToLower) = value

            
End Set

        End Property

    End Class

But you could use a regular Dictionary(Of String, String) if you like, or something fancier.

Care and feeding of LINQ

I’m getting better with LINQ, but I still have these experiences of writing LINQ code that seems perfectly sensible but is in fact wrong. Here’s today’s example (dramatically simplified to illustrate the point):

Here’s a simple object:

Class TestObject

    
Public Value As Integer = 0


End Class

and some LINQ code:

        ‘new up 5 of them with the default value of 0

        Dim LinqedList = From i In Enumerable.Range(1, 5) Select New TestObject



        
‘for each one

        For Each t In LinqedList

            
‘increment the value

            t.Value += 1

        
Next



        ‘and now for each one

        For Each t In LinqedList

            
‘print the value

            Debug.WriteLine(t.Value)

        
Next

Quick, what does this print out?

I would have expected the answer to be all ones, but in fact it’s all zeroes. I think this is because the line that creates the LinqedList object is a LINQ query, and this is run lazily every time you enumerate. So you can get a list of TestObject instances and change each one, but when you run the For Each again to print them out, you’re enumerating over a different list. Also, it can be screwy to try to debug this; in my case I was iterating over a list to remove null objects, and then getting the dreaded Object Variable or With Block Variable Not Set error (because I was removing the nulls from the wrong collection), but the debugger shows this error on a line far removed from the one where the problem was. This seems to be a common source of grief with LINQ.

Anyway, you can make the problem go away by forcing query evaluation, for instance by creating the object like this:

‘new up 5 of them with the default value of 0

Dim LinqedList = (From i In Enumerable.Range(1, 5) Select New TestObject).ToList

Then you get all ones.

Runtime/Compile Time Regex Intellisense Massacree

.NET is now so full of goodies that I’ve gotten kind of used to being able to do anything I want without recourse to the kind of black belt hacking that we had to do in the VB6 days. However, I’m currently trying to accomplish something and I just can’t figure out how to do it.

The task is kind of general, but I’ll ease into it like this: consider a regex pattern like (?'type'\w+):(?'pattern'.+). This just looks for strings of the form TYPE:PATTERN and parses out the TYPE and PATTERN as named captures. To actually accomplish this, you’d need code like:

Dim sPattern = "(?’type’\w+):(?’pattern’.+)"

Dim sInput = "SomeType:SomePattern"

Dim m = Regex.Match(sInput, sPattern)

DoSomethingWith(m.Groups(
"type").Value)

I’ve sweetened this up a little with an extension method on the Match class:


DoSomethingWith(m.GroupValue("type"))

And, not satisfied with this, I added another extension method to String (yes, some think you shouldn’t do this, but it works for me):

Dim dict = sInput.MatchAsDictionary(sPattern)

DoSomethingWith(dict!type)

This is not that clever; basically, I do the regex match, get any named captures, and return a Dictionary(Of String, String), where the String key is the name of the capture and the String value is the value. The dict!type bit is just the slightly obscure VB syntax for accessing a Default property; it’s completely equivalent to dict("type"), which is the same as dict.Item("type").

Not terrible, but I’m still not satisfied. What I really want is a way for the compiler to generate an anonymous type for me based on the named captures in the pattern:

Dim match = sInput.MatchAsAnonymousType(sPattern)  ‘not a real function

DoSomethingWith(match.type)  ‘doesn’t compile

Line 2 doesn’t compile because there is no object with a type property. What I really want is a way to tell the compiler to create such a type for me:

‘regular old match


Dim match = Regex.Match(sInput, sPattern)

‘make an anonymous type with properties that match the named captures in the pattern

Dim match2 = New With {.type = m.GroupValue("type"), .pattern = m.GroupValue("pattern")}

‘use the type for something

DoSomethingWith(match2.type)

Here, I’m creating an anonymous type whose properties match the pattern. But I want the compiler to do this for me, much as it can do this during a LINQ projection, so I can just skip that second line. It might also be nice to have an option to create a named type in cases where you would want to pass the object around to other methods. (For that scenario, it might be reasonable or better to make an AddIn that would parse the pattern and codegen up a class. But that’s a little heavyweight in just those cases where anonymous types are so handy.)

So the question becomes: is it possible to hook the compiler or otherwise give it a directive to do this? Note that it has to work at compile time, because I want a type safe object in the IDE. I think this rules out Reflection.Emit or Expression tree tricks, but I’m not sure what’s possible there.

Yes, this is a little fussy, when the dictionary approach works reasonably well, but I likes me some Intellisense!

B#?

I heartily agreed with this comment on Paul Vick’s blog:

An update on VBx…

Vick was talking about what they’re thinking of for VB10, which he called VBx (X being the Roman numeral for 10), but people don’t like that because it reminds them of the VBX control, back in the VB3 day. Anyway, this commenter suggested calling it B#. I think he was joking, but it’s actually a great idea. For one thing, it has that sexy C#, F# feel to it. I know I’d feel more like a grownup saying I code in B#. We may as well drop the “Visual”, since VB is no more visual than C# is now, and in fact I often use VB for Windows and Web Services, which aren’t visual at all. There already does seem to be a language called B#, but maybe Microsoft and those people can work something out.

I think this is such a great idea that I may send an email seconding the idea. And if all zero of my readers do too, we’ll have a movement.

Treating a file as an IEnumerable(Of String)

I love generic methods and LINQ! This is my first post on the subject, but I’ll probably be talking a lot about this in the future.

Dealing with a list of strings is a pretty common thing to do, and when I have to do this, I try to specify that list as an IEnumerable(Of String), because that a pretty fundamental unit of currency in the LINQ world. So I have a lot of methods like this one:

Sub DoSomething(ByVal listOfStrings As IEnumerable(Of String))

    
‘for each string in the list

    For Each s In listOfStrings

        
‘do something with the string

    Next

End Sub

It’s also pretty common to get that list of strings from a file. For years, I’d had a helper method that just loaded all the lines in the file and returned them in a list, but this approach recently stopped working so well, because I had some massive files and I couldn’t load them all into memory. So I wrote up a class that handles this in an interesting way, allowing me to write code like this:

Dim file As New FileObject("c:\temp\myfile.txt")

DoSomething(file.EnumerateLines)

FileObject is another helper class I wrote that enhances the .NET IO FileSystemObject to make it more useful, about which more later. Once I have one, though, I can just write file.EnumerateLines and it will create an IEnumerable(Of String) containing the lines in the file. The interesting thing here is that, in keeping with the LINQ lazy execution model, that method doesn’t just load all the lines in the file. Instead, it creates an object that implements IEnumerable(Of String) that reads the next line of the file only when asked. The next bit of the logic is this:

Public Module FileLineEnumeratorExtensions

    <System.Runtime.CompilerServices.Extension()> _

    
Function EnumerateLines(ByVal file As FileObject) As FileLineEnumerator

        
Return New FileLineEnumerator(file)

    
End Function

End
Module

This is just an extension method on FileObject to create the object that does the real work, FileLineEnumerator. I’m going to post that whole class below. Basically what it does is implement IEnumerable(Of String), such that it opens the file and when the caller does the usual enumerating thing and call MoveNext, it reads the next line of the file. So you can pass one of these to anything that expects IEnumerable(Of String) and it’ll work. Anyway, I hope someone finds this useful. If you need any of the related stuff, like my FileObject class, let me know and I’ll post it and send it to you. Happy coding.

Public Class FileLineEnumerator

    
Implements IEnumerable(Of String)

    
Implements IEnumerator(Of String)



    
‘the file we’re enumerating

    Private mFile As FileObject

    
‘a reader on that file

    Private mReader As StreamReader

    
‘the current line that we’ve read from the file

    Private msCurrentLine$



    
‘constructor

    Sub New(ByVal file As FileObject)

        mReader =
New StreamReader(file.FullName)

    
End Sub

#Region
"IEnumerable implementation"

    Public Function GetEnumerator() As System.Collections.Generic.IEnumerator(Of String) Implements System.Collections.Generic.IEnumerable(Of String).GetEnumerator

        
Return Me

    End Function



    Public Function GetEnumerator1() As System.Collections.IEnumerator Implements System.Collections.IEnumerable.GetEnumerator

        
Return Me

    End Function

#End
Region

#Region
"IEnumerator implementation"

    Public ReadOnly Property Current() As String Implements System.Collections.Generic.IEnumerator(Of String).Current

        
Get

            Return msCurrentLine

        
End Get

    End Property



    Public ReadOnly Property Current1() As Object Implements System.Collections.IEnumerator.Current

        
Get

            Return msCurrentLine

        
End Get

    End Property



    Public Function MoveNext() As Boolean Implements System.Collections.IEnumerator.MoveNext

        prEnsureOpen()



        
‘read the next line

        msCurrentLine = mReader.ReadLine



        
‘if blank

        If msCurrentLine Is Nothing Then

            ‘close the file

            prEnsureClosed()



            
‘finished enumerating

            Return False

        End If



        Return True

    End Function



    Public Sub Reset() Implements System.Collections.IEnumerator.Reset

        
‘close the file

        prEnsureClosed()

    
End Sub

#End
Region

    ‘make sure the file is open

    Private Sub prEnsureOpen()

        
‘if the reader is nothing

        If mReader.IsNothing Then

            ‘open the file

            mReader = New StreamReader(mFile.FullName)

        
End If

    End Sub

    ‘make sure the file is closed

    Private Sub prEnsureClosed()

        
‘if the reader exists

        If mReader.IsSomething Then

            ‘close it

            mReader.Close()

            
‘clean up

            mReader = Nothing

        End If

    End Sub

    ‘for proper cleanup

    Public Sub Dispose() Implements IDisposable.Dispose

        
‘make sure we close the file

        prEnsureClosed()



        
‘we’re done

        GC.SuppressFinalize(Me)

    
End Sub

End
Class

Why I like VB: Part I

Eventually, I’ll post something with code samples, and you’ll see that I use VB. I’ve been using VB more or less exclusively as a primary development language since 1995, and have been mildly embarrassed about it the whole time. But somewhere along the way I learned to stop feeling lame and love the language. Here follows some of my thoughts on the matter. And I want to stress that these are just personal feelings, not God-like pronouncements. With zero readers and all, I’m not expecting a flame war, but there’s such a thing as precedent.

Way back in the last century, I was a C programmer and I got an otherwise great job at a place that was using VB3, for which I had great disdain. My reasons were many, but mostly, having waded through Kernighan and Richie and finally gotten my head around things like *src++ = *dst++, I felt I could look down on a language that had keywords in English. And of course it was interpreted and slow. And used to make pretty forms (well, not that pretty, but our standards were lower then).

At that time, of course, I had never actually built anything very big, whereas I had written something to solve a differential equation. I don’t do much math now, but I have built large systems, and by far the most expensive thing for me is not clock cycles but the ability to develop and maintain quickly and visually. So I’ve come to appreciate those English keywords. Sure, if I used C# every day I’d remember the colon thingy for inheritance, but I kind of like the Inherits keyword. I also like the Class XXX...End Class stuff too, because when I scan a C# file, it always ends with a thicket of }’s, and I have to think harder to figure out what’s going on.

Of course, the performance differences (with C#, anyway) have been erased now that they share the CLR.

Again, just personal preferences, reinforced by daily usage. I’ll share more as I think of them.

More on my regular expression DSL

In my last post, I talked about a simple DSL for regular expressions, to address a few limitations and issues that I have with regular expressions out of the box. My DSL looked like this:


match:PATTERN [e.g., \d{3}-\d{2}-\d{4}]
match:PATTERN
donotmatch:PATTERN
donotmatch:PATTERN

The idea is that you can break the match conditions into different lines (which simplifies things) and you can have match conditions that exclude strings (you can exclude with plain vanilla regex, but I think you can’t say that a string must match one pattern AND not match a different one (and even if it is possible, it would likely be messy and hard to maintain)).

The next place I plan to take this is to match or not match based on data that we find in match captures. This would be something like this:

pattern:(?'area'\d{3})-\d{2}-\d{4}
match:area:12\d
donotmatch:area:123

Here, I’m saying to parse the input string according to the pattern, and use named captures to make a field called “area”. Then, include the input if this area field matches some pattern and exclude if it matches some other pattern. In my application, at least, this will be a much simpler way of seeing what’s going on than the only alternative I can see, which is to embed the matching logic in the original regex:

match:12\d-\d{2}-\d{4}
donotmatch:123-\d{2}-\d{4}

In particular, note that I have to repeat the whole pattern, even though the interesting part is only the first section. This violates the DRY principle in a particularly bad way, because regexes are so hard to maintain anyway. Also, it puts the emphasis on the meaning of the data (namely, that there is an area field), not on the string junk where we happen to find the data.

A DSL for a regular expression extender

In one of those serendipitous design moments, I came across this article (Language Workbenches: The Killer-App for Domain Specific Languages?) a few weeks ago just as I was wrestling with a problem. The problem is one of matching text strings that is slightly beyond what you can do with a plain vanilla regex (or at least, I can’t see how to do it). Specifically, I need to be able to match strings if they match any of various criteria. That could be done by just OR’ing multiple patterns together. But I also need to exclude strings if they match other patterns. This is the part that seems impossible; and even if it’s possible, I’m pretty sure the resulting pattern would be a hideous beast that would be impossible to follow. In fact, this is a general problem that I have with regexes.

Reading that article suggested to me a way to address this. I constructed a very simple DSL that looks like this:

match:PATTERN [e.g., \d{3}-\d{2}-\d{4}]
match:PATTERN
donotmatch:PATTERN
donotmatch:PATTERN

and a corresponding bit of code that reads this and creates a regular expression object for each line. Then, when you ask THAT object to match a string, it simply matches if the string matches all the match lines and doesn’t match any of the donotmatch lines. This is basically trivial, but it was kind of revelatory for me, because at once it dramatically simplifies the building and maintenance of patterns, in my application at least, and allows exclusions. I do a lot of this and it is making my life much easier already.