Wednesday 11 April 2012

Object Sorting

I have already indicated that recipe information will be Base96 encoded, which means a recipe for the combination of two objects will have the Ids of those two objects stored in a single string.

When we combine two objects, that combination will be stored in a similarly encoded string.

In both cases, the order in which the objects are stored in the string could be one of two permutations
  • Object A then Object B
  • Object B then Object A
But what if the objects in the recipe are in a different order to the ones being combined?

To check if the combined objects match the recipe objects we have to check each combined object against each recipe object. which is four comparisons.
  • first object combined against first recipe object 
  • first object combined against second recipe object 
  • second object combined against first recipe object 
  • second object combined against second recipe object 
Which would have to be done for each recipe.

The reason I chose to use Base96 encoding is that by combining the objects, we could cut this down to one check
  • Encoded combination string against encoded recipe string
But for this to work, we need to make sure the objects are always in the same order in the encoded string.

We could just do a check and encode them in the right order, which would be simple enough.

However, if we decide to use the same encoding to handle group memberships - and we probably will, there could be more than two objects encoded in a string - in which case it would be nice to have them come out in order.

We can solve all of this by having a routine to sort Base96 strings.  Which is the last of our Base96 Functions.

But I get to that,  I want to look at the task of sorting, for which there are a multitude of methods.

Perhaps the simplest, and certainly the first one I learnt was the bubble sort (http://en.wikipedia.org/wiki/Bubble_sort).  This checks two adjacent objects and if they are in the wrong order swaps them.  It then goes on to the next two and the next.

This has the effect that the higher of the first two values checked "bubbles" to the top, you then go back to the start and do it all again, stopping short of the top one as that is already in place.

It does the job, but is not the quickest.

The fastest are the binary sorts (http://en.wikipedia.org/wiki/Quicksort), which split the list in two and compare objects either side of the split.  These are certainly faster, but tend to use a lot of recursion, splitting the list over and over again - each time moving objects to the correct side of the split.

They are fast, but a bit much for what we need.

I am going to use the selection sort (http://en.wikipedia.org/wiki/Selection_sort) - which is like the bubble sort, but only does one move per pass of the list.  It's not fast, but its also not complex.

Coding wise it's very easy to follow, it uses two nested loops, the first represents the position we are going to fill.  this starts at position 1 and goes to 1 short of the last position.

The second (inner) loop is the value being checked.  this goes from just above the position of the outer loop to the end of the list, looking for the lowest value.  When it has gone through the list, the lowest value it has found is moved to the position of the outer loop.

The actual sort will be a separate function, so can be upgraded later if required.

No comments:

Post a Comment