Friday, 26 July 2013

Adventures in MySQL

About a year ago, I wrote a particularly geeky post about an experiment I did using MySQL for something for which a database was probably never intended. As it happens, I was reminded today about a similar abuse of MySQL I and a colleague tried several years earlier just to see if it were possible. Having re-discovered it, and having not written a blog post for a long time, I thought I would share it with a wider audience. Who knows? Someone may even take it and make it into something useful.

The plan this time was to write a text adventure game in MySQL and, as with the Mandelbrot Set generator, I wanted it to be a genuine use of the database rather than just an exercise in translating algorithms that could just as easily be written in any other language. Ideally, I wanted the commands that the players uses to be standard MySQL commands but that became too complicated so everything is done using stored procedures. However all data, state and information governing the interaction with the computer controlled character are stored in the tables and it should be possible to replace all of this and get a completely new game. Well, that is the theory.

Anyway, the source code to the text adventure as well as the Mandelbrot set are now on github for anyone to fork and do with as they wish.

What else can be forced into a database? Chess?

Friday, 16 November 2012

Spoiler alert!

It was 3 weeks ago that I wrote a comment on the Police and Crime Commissioner election that has just taken place, expressing my concern that it would just turn into another political pawn. As it turned out, the whole this was a bit of a farce, with a record low turnout and an amusing episode of an empty ballot box. There are a number of reasons for people not turning up to vote, ranging from lack of information, with many people not even realising that the the election was on, to the time of year, the lamest excuse in my opinion.

Apparently, all the information we needed was on the Internet with a free website available and these days campaigning is done more and more online. However, anyone looking online, in particular following the #PCC hashtag on Twitter, would have seen lots of comments along the lines of "I'm not going to vote", "vote for anyone, but please just vote" and "spoil your ballot paper!" This last one appears to have had some effect as there was an unusually high proportion of spoilt papers. I wanted to find the percentage of ballot papers spoilt in each vote but curiously, the figure for 'turnout' in the results "does not include spoilt ballot papers". This adjustment of the figures only happened some time after the first results went up, so I was able to find that out of 81447 votes cast in the Wiltshire election, 2683 or 3.3% were spoilt. I understand that this figure was similar around the country but it is now very hard to get hold of the numbers.

The spoilt ballot papers are not a minor issue. In any election there are a certain number of invalid voting papers because someone doesn't understand the instruction or has not marked it clearly, but this was clearly not the case here. Of the few people who turned up to vote, around 3.5% of them went to the trouble of visiting their polling station to deliberately deface the ballot paper. When it is so hard to get people into the voting booth surely it means something if so many people will willingly go specifically to not vote. David Cameron says that they are listening so it will be interesting to see what he makes of this.

On a more positive point, I am pleasantly surprised that my initial fear was not entirely realised. Eleven of the forty one commissioners are independent, which is eleven more than I expected. It is true that many of these do have political allegiances and some even tried and failed to be the official party candidate but, not having taken any party funding, they are now under no obligation to toe a party line. However sincerely each new PCC swears the oath of impartiality in the eyes of public they will still be seen as representing their parties and I am certain that, with very little other information for the electorate to go on, most of those eleven owe their election to the word 'Independent' under their names.

Living in Manchester my new police commissioner is, predictably, Labour but he has expressed his opinion that the independence of the Police in terms of operational independence is needed and he has no intention of bringing political values into force. I have no reason to doubt this but time will tell.

Tuesday, 13 November 2012

I've got a little list.

Last night I send an email to each of the candidates for the Greater Manchester Police and Crime Commissioner election next Thursday, not really expecting much of a reply. The email to four of the five candidates ran thus:

Dear $NAME
According to the Home Office web page, the Police and Crime Commissioner is required to "serve the people, not a political party or any one section of their electorate" (http://www.homeoffice.gov.uk/police/police-crime-commissioners/public/what-is-pcc/). You, in common with all but one of your opponents, are representing a political party and this pattern is true for the entire country with only 30% of candidates being independent.
I am very concerned that this new post will be just another political tool so could you please explain to me how you, as the $PARTY candidate, intend to fulfil your responsibilities while maintaining independence from your party.

The fifth candidate is independent so the email I sent him was more along the lines of what his policies would be. However, to date I have not had a reply to this.

As it happened, I got replies from the Conservative, Labour and Liberal Democrat candidates in that order and so have decided to reproduce them here, verbatim and without comment. I have replied to the final email from each but, to stick to my decision not to add comment, I will not publish anything that has not been responded to.

Firstly, I received the following within an hour from Michael Winstanley, the Conservative candidate:

Thanks for your email. If elected I will swear on oath of impartiality to do my job without fear or favour. Once in office I will not taken any instructions from the party machine in the execution of my duties. The only people I will answer to is the people of Greater Manchester. I will also stand up against any Government if I think that they are wrong or in order to get a better deal for Greater Manchester.

I replied thus, waiting until the following morning to give the last sentence some justification:

Thank you for your quick response to my question. I am aware of the oath that all new Police and Crime Commissioners will be required to swear on taking office and I have no reason to doubt your integrity. However, considering that most candidates are backed by political parties and the very existence of this election has come as a surprise to many people, it is understandable that people are cynical. Most people I have spoken to are distrustful of anyone who is not standing as an independent, and independents are few and far between. Unfortunately, I believe that the turnout on Thursday will be low and the new PCCs will be elected based on the political motivations of the minority of the electorate.
I am sorry to be so negative about this but I am quite disappointed about the way this is going. I still do not know what I will do on Thursday but you might be interested to know that, so far, you are the only candidate who has replied to my email.

to which he replied

I totally appreciate the situation that you are in. I also understand that people are cynical about the process. But if you can give me your support on Thursday I would appreciate it. It will also be an opportunity for me to prove that I will do what I have pledged to do. If at the end of the 3.5 years then you can get rid of me.
The only thing that I would say on the Independent issue and I am not inferring anything about the candidate who is standing. But how independent is anyone? What are their political views? Because I could stand as an Independent but have an extreme view either left or right. At least with someone standing under a Party Banner you have a fair idea where they are coming from.

Mid afternoon I received a reply from Tony Lloyd

The independence of the Police in terms of operational independence is needed and I have no intention of bringing either political values or control in to the force. In terms of politics more generally it is absolutely legitimate that the Commissioner operates independent for example of the national Government so that without fear or favour they are prepared to raise privately and publicly issues affecting the security of the people of Greater Manchester. My record in Parliament shows that I did vote against and speak out against the intentions of the Labour Government as a Labour MP. It would though be unreal for me to claim not to be a Labour politician as I have been one for over 30 years. The values of the Labour Party inform me about what I believe to be right for our society.

Finally, late in the evening I received my longest reply from Matt Gallagher.

Thanks for getting in touch with me. I recently posted the following item on my website (Matt4PCC.org) to explain how I ended up as a Lib Dem sponsored candidate. To be frank, the ludicrous rule about putting up a £5k deposit would see off all but the wealthiest independents, and without the Lib Dems agreeing to put up the deposit (in the ardent hope that they will get it back), I couldn't have run.
As the article says, I wrote my own manifesto. I have received no money from anyone, not even the Lib Dems. I have paid £700 from my own savings (and got a receipt) to pay for some leaflets to insert in the Lib Dem ones being put out for the Manchester Central by-election, but the party won't help PCC candidates because they oppose political involvement in the police. Their manifesto calls for an elected Police Authority, but they didn't get support for that from Labour or the Tories, so we are stuck with elected PCC's.
I'm the protest vote, giving the electorate an alternative to politicians (the independent didn't declare until 15 minutes before the close of nominations, by which time I was committed). If elected, and its a big if as I am faced with a ruthless, well-organised Labour machine that has the local press in its pocket and controls 8 of the 10 boroughs, I will be taking an oath of impartiality. I have also told the Lib Dems that there can be no political interference in the running of the police, indeed they are happy with that. My Consultative Committee will contain 7 Assistant Commissioners, appointed by the groups they represent, not me (see my manifesto). I have no idea who they will be, or their political persuasion, because it is important that they represent their constituency, not my views and opinions.
I suspect that the other candidates, if elected, will not be falling over themselves to appoint staff from beyond their own political parties. We shall see;
People have been asking me ... (linked article reproduced in email)

I await replies from Stephen Woolfe and Roy Warren but I will put them up if and when I get them.

Saturday, 27 October 2012

When constabulary duty's to be done

In a couple of weeks time the whole of the UK will go to the polls to vote for regional Police and Crime Commissioners, an office that I had never heard of until quite recently when I started seeing advertisements going up all over the place telling me that the local hoodlums really do not want me to go out and vote on 15th November.

It turns out that the PCC is a new post created by our wonderful coalition government and was, conveniently, in both the respective parties' manifestos at the last election. It certainly has the air of a Con-Lib agreement, with an emphasis on cracking down on crime (Tory) while at the same time insisting that it should be chosen by a democratic vote (Lib Dem).

According to the Home Office website,

Police and crime commissioners (PCCs) will ensure the policing needs of their communities are met as effectively as possible, bringing communities closer to the police, building confidence in the system and restoring trust.

It then goes on to explain how the PCC will make decisions about law enforcement and provide a voice for the public regarding the way policing is done. Of particular interest is the section on impartiality in the description of what the role of the PCC will be, which states that "PCCs are there to serve the people, not a political party or any one section of their electorate." I believe that this last point is essential, particularly at a time when the police are often viewed as being on the side of authority against the people.

This being the case, I was disappointed to see in the local rag a few days ago that the candidates for which I will be able to vote include one conservative, one labour, one liberal democrat and one UKIP. Looking further afield I see that this political spectrum is typical in most areas. So much for not serving a political party or any one section of the electorate. There is also one independent who, in addition to having knowledge and experience of the law and justice system, appears also to have the same concern as I do that the position must be genuinely independent. However, his electoral statement gives no indication of what his plans are should he win and, while his independence makes him the candidate for whom I am most inclined to vote, I am left wondering why I should bother.

There will be a debate with the five candidates for Greater Manchester next Tuesday and perhaps this will make clearer the positions of each. Some people may question whether we need this new office at all but this is not the question at hand. On Friday 16th November there will be new Police and Crime Commissioners up and down the country who will have real power over the running of the police forces. Call me naive but I would like to think that we still have a chance to prevent these from becoming yet more pieces in the grand political game.

Wednesday, 27 June 2012

Have you got anything without Spam?

Facebook decided yesterday that what everyone really needs is an @facebook email address while the rest of the world were of a different mind. I remember when Facebook started offering email addresses and I declined then as I decided that one more address to add the the two I already have is the last thing I want. I was a bit annoyed, therefore, to find that I now have one that can be hidden but not removed from my profile, like an embarrassing ink-stain on the pocket of a formal shirt. It is slightly amusing that anyone who has not claimed a www.facebook.com/personal-name for their profile page now has "pseudo-random-number"@facebook.com

Normally I would not be too worried about Facebook springing changes like this without notice, as they have done numerous times in the past. My main concern this time, however, is that emails can be sent from anywhere to this address and they appear identical to Facebook messages yet there is no spam filter nor anti-virus. Previously you would at least need to be logged in to the account from which you are sending the message making it hard, though not impossible, to spoof. It would appear that it is only possible to send emails from addresses that are already known by Facebook (doesn't this cripple email somewhat?) but with 900 million Facebook users it wouldn't be too difficult to forge addresses from the numerous spam mailing lists that are doing the rounds.

I have already checked that it is possible to send emails with attachments and links - still an effective way to spread trojans - although I haven't tried to send carefully crafted javascript that makes you a 'Fan' of Justin Bieber as soon as you open it (I sincerely hope this is not possible). Maybe Facebook has thought about this but I cannot find anything on the help system. Perhaps it was mentioned in the press release they forgot to send.

Tuesday, 12 June 2012

... now with even more SQL

A couple of days ago I wrote a post about creating a Mandelbrot set in SQL in which I mentioned that MySQL doesn't have any means of creating an image. The idea of this would be as outlandish as the flight simulator file browser they put in Jurassic Park to make computers look interesting - there is no way that such a thing would ever exist in real life. I was surprised, therefore, when a colleague suggested using Netpbm, a 'lowest common denominator' image format that can be created as a text file. This, together with MySQL's ability to write the result of a query to a file, means that the whole thing can be done via SQL without resorting to any perl. The only downside is that the file is created with the permissions of the system user running the database server and, as a security measure, it will not write to a file that already exists. This means that the whole file must be created by a single query, superuser privilege is required if you want to delete the file and the file must be deleted before attempting to create it again. The solution looks like:

SELECT @xmax:=COUNT(c_re) INTO @xmax FROM points GROUP BY c_im LIMIT 1;
SELECT @ymax:=COUNT(c_im) INTO @ymax FROM points GROUP BY c_re LIMIT 1;
SET group_concat_max_len=11*@xmax*@ymax;
SELECT
  'P3', @xmax, @ymax, 200,
  GROUP_CONCAT(
    CONCAT(
      IF( active=1, 0, 55+MOD(steps, 200) ), ' ',
      IF( active=1, 0, 55+MOD(POWER(steps,3), 200) ), ' ',
      IF( active=1, 0, 55+MOD(POWER(steps,2), 200) ) )
    ORDER BY c_im ASC, c_re ASC SEPARATOR ' ' )
    INTO OUTFILE '/tmp/image.ppm'
  FROM points;

The output filename may need to be changed depending on operating system. The output image looks like the one to the right although this image has been converted into a png file as Picasa doesn't like Netpbm files and, furthermore, the png file is about 1% of the size. Theoretically it should be possible to create the ppm file in the slightly smaller binary format with the code below but for some reason it doesn't work for me:

SELECT @xmax:=COUNT(c_re) INTO @xmax FROM points GROUP BY c_im LIMIT 1;
SELECT @ymax:=COUNT(c_im) INTO @ymax FROM points GROUP BY c_re LIMIT 1;
SET group_concat_max_len=3*@xmax*@ymax;
SELECT
  'P6', @xmax, @ymax, 200,
  GROUP_CONCAT(
    CONCAT(
      CHAR(IF( active=1, 0, 55+MOD(steps, 200) ) ),
      CHAR(IF( active=1, 0, 55+MOD(POWER(steps,3), 200) ) ),
      CHAR(IF( active=1, 0, 55+MOD(POWER(steps,2), 200) ) ) )
    ORDER BY c_im ASC, c_re ASC SEPARATOR '' )
    INTO OUTFILE '/tmp/image.ppm'
  FROM points;

I would be interested to know if anyone can see the problem.

Sunday, 10 June 2012

Fractsql

I was thinking about the Mandelbrot set the other day - I cannot remember exactly why but this is the sort of thing that happens to someone who is a mathematician by nature. While I was growing up computers were beginning to become commonplace in people's homes and the Mandelbrot set, being a pretty way of showing off the capability of these new toys, was in fashion and I knew what it looked like before I had any idea what it meant.

Most people think of the Mandelbrot set as an intricate pattern made up of lots of coloured shapes somewhere between circles and triangles (see left). However, as I discovered, the set is in fact the black bit in the middle and all the coloured bits are precisely what the set is not. In fact, the colour of the points shows how quickly they can be discarded.

As well as being a mathematician I also work a lot with databases and my recent thoughts about the Mandelbrot set centered around the possibility of creating it in a database, specifically a MySQL database.

There are numerous explanations of the Mandelbrot set out on the Internet but it would make sense to have a brief introduction here. If you cast your mind back to mathematics at school you may recall that there are methods of solving equations like z2-z+c=0, where c is a constant. For some values of c it would be possible to factorise this equation to find integer solutions while for other values a formula can be used. Alternatively you could note that the equation can be re-written as z=z2+c and then use a method of iteration by guessing a solution and, by repeatedly evaluating the right-hand-side of the equation get better guesses. For example, in the case of z=z2-0.5 we might guess the solution to be zero and then, by following this iteration, get the following sequence:

Guessed zz2-0.5 = next guess
0-0.5
-0.5-0.25
-0.25-0.4375
-0.4375-0.3086
-0.3086-0.4048
(etc)(etc)
-0.3661-0.3660
-0.3660-0.3660

and -0.3660 turns out to be the solution, to four decimal places. A computer is very good at this sort of thing and, when the equation is more complicated, this method is extremely useful for finding an estimate of the solution. It is quite satisfying when this works out but, unfortunately, this isn't always the case. Take z=z2-12, which has the nice, easy solutions of 4 and -3.

Guessed zz2-12 = next guess
0-12
-12132
13217412
(etc)(etc)
lotseven more
even moregiving up yet?

The Mandelbrot set is essentially the set of all numbers c for which the iterative method z=z2+c, starting with z=0, remains bounded. However, it is a bit more complex than that (did you see what I did there? Eh? Oh, never mind) because this description simply results in the set of numbers from -2 to 0.5 and certainly not the intricate multi-coloured pictures that everyone remembers. We are actually concerned with complex numbers, which may be represented in two dimensions. For those unfamiliar with complex numbers, they can be though of as Cartesian (x, y) coordinates in a plane and the only mathematics we need to do with them for the purpose of this is:

  • adding two complex numbers together, which is (x, y) + (a, b) = (x+a, y+b)
  • squaring a complex number, which is (x, y)2 = (x2-y2, 2xy)

With these two operations we can do everything we need with the equation z=z2+c. However, instead of x and y I will use Real (or Re) and Imaginary (Im).

So what has this got to do with databases? The usual way in which programs were written to generate a Mandelbrot set was to go through each point on the plane in turn, taking that value as c, and iterate over the function z=z2+c until it could be decided whether or not the value would stay within a boundary. After doing this the point is drawn as black if z remained within the boundary or a colour determined by how quickly it grew. The algorithm for this looks like:

  Start of main loop
    Choose next point, c
    Set z=0
    Start of iteration loop
      Calculate z_new = z^2+c
      If z_new is too large
        c is not in the set
      Else if we have tried enough iterations
        c is in the set
      Else
        Go back to the start of the iteration loop with z=z_new
    End of iteration loop
    If there are more points to try
      Go back to the start of the main loop

This algorithm is very simple and has probably been written for most conventional programming languages. I did check to see if it had been written in SQL and indeed it had in a few of places, for T-SQL, Postgres and MySQL but in all these cases the algorithm above had been replicated in each flavour of SQL to 'plot' the graph in ASCII art. This is a nice trick but completely ignores the fact that working with a large set of data, performing the same operation on every point, is exactly what SQL is good for. If the points are stored in a database then one step of the iteration can be performed on every point at the same time with one query rather than laboriously stepping through each one at a time. The algorithm then becomes:

  Populate the database with all the values of c
  Set z=0 for every point
  Loop
    Update all rows in the database setting z to be z^2+c
    Mark all rows where z is too large as not in the set
    Repeat the loop until bored

For each point I needed the number c and the current value of z at a particular point in the iteration. Both of these are complex numbers and so have a real and imaginary part. I also need a flag to say whether or not the point is known not to be in the set - I call this 'active' and set to 1 if the point is still to be iterated. Finally, I need a count for the number of iterations that have been performed on the point, so that the picture can be coloured correctly. The table therefore looks like:

Real part of cc_re DOUBLE
Imaginary part of cc_im DOUBLE
Real part of zz_re DOUBLE DEFAULT 0
Imaginary part of zz_im DOUBLE DEFAULT 0
Iterative steps countsteps INT DEFAULT 0
In set (1=yes)active CHAR DEFAULT 1

With the table set up like this my first attempt, which contains an error (can you spot it?), was to perform the iteration with:

      UPDATE points SET
          z_re=POWER(z_re,2)-POWER(z_im,2)+c_re,
          z_im=2*z_re*z_im+c_im,
          steps=steps+1
        WHERE active=1;

This would perform one iteration on every point in the plane that had not already been determined as not in the set. The next step is to determine which points can be discounted.

      UPDATE points SET active=0
        WHERE POWER(z_re,2)+POWER(z_im,2)>4;

The value of 4 as the bound comes from noting that the entire Mandelbrot set falls within a circle of radius 2 centred at the origin. These two queries, repeated multiple times, should be all that is required to generate the set from database populated with the sample data. The only thing that MySQL cannot do is to plot the result so I had to resort to perl for this (Update:Yes it can). It was at this point I realised something was not quite right when it came out looking more like a spreadeagled frog (see right) than the picture I expected.

It took me a while to realise that the problem was that I had been naive in the first UPDATE; when it set z_im=2*z_re*z_im+c_im it was using the value of z_re after it had already been updated. There are ways of getting round this with temporary tables but this is not really practical with a dataset of tens (or hundreds) of thousands of lines. The solution was to add extra columns for a temporary value of z and do the iteration in two steps, making the two queries:

      UPDATE points SET
          znew_re=POWER(z_re,2)-POWER(z_im,2)+c_re,
          znew_im=2*z_re*z_im+c_im,
          steps=steps+1
        WHERE active=1;
      UPDATE points SET
          z_re=znew_re,
          z_im=znew_im,
          active=IF(POWER(z_re,2)+POWER(z_im,2)>4,0,1)
        WHERE active=1;

And there you have it - the Mandelbrot set in two SQL queries. Not only is this simpler than the usual algorithm that calculates each point in turn, it makes it very easy to produce animations like the one on the right. If anyone wants a copy to play around with the source code is here.