name | contents -----------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Welcome to Platypus Innovation | Hello and welcome! Test post #test #wednesday #engelberthumperdinck thisisnotatag | \r : Yet Another test post #test #wednesday #engelberthumperdinck thisisnotatag | Nothing to see here, move along.\r : \r : Test of footer | This is a test, and has nothing whatsoever to do with tents.\r : Error posting to platypusinnovation.com | Only registered users of platypusinnovation.com may post via email.\r : \r : The message concerned was\r : \r : This comment ought to show up in the error message.\r : \r : -- \r : The progression of a Lisp programmer: The newbie realizes that the\r : difference between code and data is trivial. The expert realizes that\r : all code is data. And the true master realizes that all data is code.\r : -- Sriram Krishnan\r : Error message test | This should show up in the error message.\r : Error posting to platypusinnovation.com | Thread 87262 does not exist\r : \r : The message concerned was\r : \r : This comment ought to show up in the error message.\r : \r : -- \r : The progression of a Lisp programmer: The newbie realizes that the\r : difference between code and data is trivial. The expert realizes that\r : all code is data. And the true master realizes that all data is code.\r : -- Sriram Krishnan\r : Error posting to platypusinnovation.com | Thread 87262 does not exist\r : \r : The message concerned was\r : \r : This comment ought to show up in the error message.\r : \r : -- \r : The progression of a Lisp programmer: The newbie realizes that the\r : difference between code and data is trivial. The expert realizes that\r : all code is data. And the true master realizes that all data is code.\r : -- Sriram Krishnan\r : Error posting to platypusinnovation.com | The group mcguffins does not exist.\r : \r : The message concerned was\r : \r : Only registered users of platypusinnovation.com may post via email.\r : \r : The message concerned was\r : \r : This comment ought to show up in the error message.\r : \r : -- \r : The progression of a Lisp programmer: The newbie realizes that the\r : difference between code and data is trivial. The expert realizes that\r : all code is data. And the true master realizes that all data is code.\r : -- Sriram Krishnan\r : Error posting to platypusinnovation.com | The group mcguffins does not exist.\r : \r : The message concerned was\r : \r : Only registered users of platypusinnovation.com may post via email.\r : \r : The message concerned was\r : \r : This comment ought to show up in the error message.\r : \r : -- \r : The progression of a Lisp programmer: The newbie realizes that the\r : difference between code and data is trivial. The expert realizes that\r : all code is data. And the true master realizes that all data is code.\r : -- Sriram Krishnan\r : Winterwell becomes carbon neutral | \r : \r : Climate change may be the greatest threat facing mankind. We all have a duty to do our part in combating it. Therefore Winterwell have decided to become carbon neutral. We will use offsets to cancel out the emissions generated by our business activities.\r : \r : Thank you to Gary of ecometrica for performing the footprint calculation. Java memory options: big up your JVM | If you're running hefty Java programs, you may find you get out-of-memory exceptions. Even without these, your system may be underperforming. You can increase the memory allocation via the -Xmx option.\r : \r : You can find out about the extra JVM options by running java -X. The memory related options are:\r : \r : \r : -Xms set initial Java heap size\r : -Xmx set maximum Java heap size\r : -Xss set java thread stack size\r : \r : \r : A typical setup might be java -Xmx512m (assigns 512 megabytes of heap memory)\r : \r : If your programs need gigabytes of memory, you might use `java -Xmx1g` (assigns 1 gigabyte of heap memory)\r : \r : The -X options do come with a warning: they are "non-standard and subject to change without notice." That said, the memory options have stayed constant for as long as I can remember[^1].\r : \r : Further reading: \r : \r : [^1]: A long time, young grasshopper, a long time. Error posting to platypusinnovation.com | You tried to post to the group mcguffins, but you are not a member.\r : \r : The message concerned was\r : \r : Thread 87262 does not exist\r : \r : The message concerned was\r : \r : This comment ought to show up in the error message.\r : \r : -- \r : The progression of a Lisp programmer: The newbie realizes that the\r : difference between code and data is trivial. The expert realizes that\r : all code is data. And the true master realizes that all data is code.\r : -- Sriram Krishnan\r : ThinkTank Mathematics liquidates | ThinkTank Mathematics is in liquidation. I was a founder and director of ThinkTank Mathematics, and its end is a sad event. However I have since been part of creating Winterwell Associates, which has been a rewarding new venture. I understand some of my ex-colleagues from ThinkTank Mathematics have also created a new company.\r : \r : ThinkTank Mathematics set up shop back in January 2005, founded by myself, Hannu Rajaniemi and Sam Halliday:\r : \r : > "We're a small Edinburgh-based mathematical consulting group and we'd like to know if you have any problems which can be solved via mathematical modeling. You may not think so at present, but we intend to convince you."\r : \r : Please note: ThinkTank Maths Ltd. (company number SC343621) is a different legal entity from ThinkTank Mathematics Ltd. (company number SC295336). ThinkTank Maths Ltd have asserted copyright ownership for the ThinkTank Mathematics logo used above. Their lawyers have requested its removal, although its usage is entirely legitimate under fair dealing. I have temporarily obliged as a gesture of goodwill.\r : Error posting to platypusinnovation.com | You tried to post to the group mcguffins, but you are not a member.\r : \r : The message concerned was\r : \r : Thread 87262 does not exist\r : \r : The message concerned was\r : \r : This comment ought to show up in the error message.\r : \r : -- \r : The progression of a Lisp programmer: The newbie realizes that the\r : difference between code and data is trivial. The expert realizes that\r : all code is data. And the true master realizes that all data is code.\r : -- Sriram Krishnan\r : Twestival ideas | Joe's notes from our meeting on 22/7/09. When's the actual event? -MRG\r : \r : Notes from yesterday because my brain is a sieve:\r : \r : - Quiz format\r : - Cryptic crossword clues? (not requiring specialist crossworder\r : knowledge) Possibly the answers are connected e.g. to a literary quote\r : or riddle\r : - Tough (ungoogleable) questions: like that school quiz or Round Britain quiz\r : \r : - Extreme abridging (Project Twutenburg)\r : - Works of literature/poetry in 140 chars\r : - Radio 4 themed? "The afternoon play", "Woman's Hour serial"\r : \r : - Call my bluff/The meaning of Liff\r : - Something boggley?\r : Using a webcam with Java and JMyron | I'm working on a digital art project involving a webcam (more details soon...). As is often the case, the hardest part of the code was what should have been the simplest, and was certainly the least interesting: connecting to the webcam. This post shows you how to do that using Java and the JMyron cross-platform library. The JMyron documentation is not great, but the library itself is very useful.\r : \r : ## Installing JMyron and capturing video from your webcam\r : \r : 1) Download JMyron from http://webcamxtra.sourceforge.net/download.shtml (you want the file linked to under the heading Processing).\r : \r : 2) If you're using Windows, make sure the dlls are somewhere the program will find them. E.g. in the program working directory or in C:\Windows\System32\r : \r : 3) Start JMyron:\r : \r : JMyron jmyron = new JMyron();\r : // start a capture with the given width and height\r : // - though you may not get these settings!\r : jmyron.start(320, 240);\r : // disable the (not terribly good) blob detection to speed up frame rate\r : jmyron.findGlobs(0);\r : \r : \r : 4) To take a snapshot:\r : \r : jmyron.update();\r : int[] pixels = jmyron.image();\r : // You then access pixels like this:\r : int pixel = pixels[x + y*width];\r : \r : \r : There are a couple of ways to use the pixel data. You can use new Color(pixel) to interpret a pixel int. Or you can use BufferedImage.setRGB(x, y, pixel) to build up an Image from pixel ints.\r : \r : ## Troubleshooting\r : \r : Problem: The image is split into several stripy images. \r : Solution: The webcam is not sampling at the width/height dimensions you requested. I do not know the cause for this. The workaround is simple. Check `jmyron.getForcedWidth()` and `jmyron.getForcedHeight()` to see what width and height the webcam is actually using. Use these settings instead. Painless Java Profiling using JIP | The first rule about optimising your code is don't. Much of the time it'll be fast enough anyway. By trying to optimise you'll only slow down development and write uglier code that'll be harder to maintain.\r : \r : The second rule about optimising your code is that if you have to do it, use a profiler. It is really the only reliable way to find the bottlenecks. Otherwise you're hard work is liable to deliver, say, a ten-fold speed up on a method that only uses 5% of your CPU time, which isn't really worth diddly-squat.\r : \r : Last time I did some optimisation, using a profiler paid off: I got a 30% speed-up with a simple edit when the profiler identified an unsuspected bottleneck.\r : \r : There are many profiling tools available. Some are free, some commercial. They offer different ranges of features, IDE integration, etc. I use Andrew Wilcox's JIP profiler. It's small, simple, easy to install and works for any computer setup. On the other hand you don't get all the options, or as nice a GUI, but I found that these things were outweighed by the fact that it does the core job well, and it took maybe 15 minutes to download, install, configure and run.\r : \r : Setting up JIP is refreshingly easy:\r : \r : * Download from http://jiprof.sourceforge.net/.\r : * There's a config file called profile.properties which you'll probably want to edit with options such as not logging calls to certain packages. It's documented and pretty straightforward.\r : * Run your program passing 2 extra arguments to the JVM:\r : `-javaagent:DIR\profile.jar -Dprofile.properties=DIR\profile.properties` \r : where DIR is the absolute path for the directory containing profile.jar and profile.properties. You'll find them in the JIP download in the sub-directory profile.\r : * If you use Eclipse, I recommend defining a new JRE with the JIP arguments. You can do this via Preferences->Java->Installed JREs. You can then select to use the profiler by switching JRE in the Run... dialog.\r : * The output comes out as a file profile.txt or profile.xml (or both) depending on the config settings.\r : * You can examine the txt file by hand, but it's easier to use JIP's GUI jipViewer to examine the xml file. The GUI's not mentioned in the documentation, but it comes included in the download (in the profile sub-directory). It isn't beautiful, but it does the job. You launch the GUI after a profile session by running java -jar jipViewer.jar profile.xml.\r : * Now the hard part: find the bottlenecks and fix them. It's important to keep copies of the old profiling runs so that you can measure what effect your edits are having.\r : \r : JIP is available from \r : \r : NB - JIP stands for Java Interactive Profiler, which is an odd name because it isn't particularly interactive.\r : Test of Rongbuk stuff | This is a test, do not be alarmed.\r : \r : -- \r : Just to give you some more background on Evanescense, should you need\r : it. They are the shittest most horrible bunch of fucking fuck cunts ever\r : to take on Ben Affleck in a competition for the prize of most hateful\r : bunch of whining fuckers that ever walked this poor, molested planet. To\r : listen to them is an offence to mortal ears. Even just hearing their\r : name makes me angry. There really is no excuse for their existence.\r : \r : I'm going to go and have a shower now to wash away the unclean feeling\r : after discussing /that band/.\r : -- Ben Godfrey\r : trash versus delete | What are the intended semantics of DBObject.trash()? I get that it marks : things as trash and removes their slugs so they're fractionally harder : to find, but is there some other garbage-collection process that : actually deletes trashed objects? : : This is somewhat related to a fix I've just made to the generic servlet: : there was a hidden "action=save+edit" field which was preventing the : buttons from doing anything else. I've removed it: the Delete button now : trashes objects, but doesn't delete them. Trashing and not deleting is : its documented behaviour, so I've left it like that. : : I've also added an "edit" button to the list servlet. Turn on "show : extra details" to see it. Do you trust your cornflakes? | The Reader's Digest conduct an annual survey in trusted brands. This is a media-friendly piece of market research - but flaws in the design mean that it has little value as market research. Key insights this year: Kelloggs is more trusted than Volkswagen.\r : \r : The main problem with the survey is that the questionnaire is badly constructed. People are asked to name a trusted brand. This conflates two factors: brand awareness (i.e. size), and trust. For example, Dell won as the most trusted computer manufacturer. Few of the respondents were likely to pick Sun - but that doesn't mean they think Sun are up to no good.\r : \r : This mess could be sorted out by establishing a Bayesian prior which would allow us to separate trust from awareness. The prior in this case could be to ask people to name any brand for that category.\r : \r : Another approach would be to also collect data on un-trusted brands, and look at the trust/dis-trust ratio for each brand (with confidence interval analysis to handle brands with low numbers of responses).\r : \r : These measures would be in the Reader's Digest interest: there is little cost to them and it would give the survey a good deal more credibility. Elsewhere there is a flaw which is unlikely to ever be patched - a piece of bad statistics promoting the survey itself.\r : \r : Apparently 34 percent of Europeans recognise the Readers Digest Trusted Brands logo. This figure seems highly dubious. I'd never heard of the thing before, and even Google doesn't pick up many links. Are you aware of the Readers Digest Trusted Brands logo?\r : \r : What was the methodology here? People were shown the Trusted Brands Logo and asked to tick a box if they knew what it meant.\r : \r : Imagine if maths exams were conducted that way. "Here is the formula for integration by parts. Tick the box below if you think you might know what it means. 5 points"\r : \r : Plus this used a biased sample: asking Readers Digest readers to recognise the Readers Digest Trusted Brands Logo.\r : A drop of Guice: Replacing factories with Google | Strawberry and Watermelon Juice cc SiFu Renka on Flickr\r : \r : Everyone's familiar with using factories and singletons. Google's Guice library offers an alternative approach based on dependency injection. Their pitch is (paraphrasing) "Do you use factories and singletons? We can offer you a better approach! All you have to do is rewrite your entire codebase with magic annotations." That's a heavy pitch. Like a stranger proposing marriage and a move to Denmark. You of course will say no, even if they do look hot and you're really into Lego. This post shows a very simple use of Guice. Think of it as a first date with no strings attached.\r : \r : Suppose you have a class or interface Monkey, and you currently use a factory MonkeyFactory to generate monkey objects in a flexible project-dependent manner. How would that work in Guice?\r : \r : First, create a module:\r : \r : \r : class GuiceModule implements Module {\r : public void configure(Binder binder) {\r : // Produce Chimpanzees\r : binder.bind(Monkey.class).to(Chimpanzee.class);\r : }\r : }\r : \r : \r : Then create a Guice Injector. This should be a singleton which you reuse throughout your project.\r : \r : public static Injector injector = Guice.createInjector(new GuiceModule());\r : \r : \r : Now you can use it to generate instances:\r : \r : Monkey newMonkey = injector.getInstance(Monkey.class);\r : \r : \r : Suppose Monkey should be a singleton, but you want to set it in a flexible project-dependent manner. How would that work in Guice? All we need to change is the binding in the module:\r : \r : class GuiceModule implements Module {\r : public void configure(Binder binder) {\r : Monkey singleton = new KingKong();\r : binder.bind(Monkey.class).toInstance(singleton);\r : }\r : }\r : \r : \r : You can find out more about Guice and download the jar files from \r : At the Bundyfest | \r : \r : Professor Alan Bundy, a leading figure in automated reasoning and my Ph.D. supervisor, is 60. In his honour, the Division of Informatics organised a conference this week of Alan's alumni and fans. Here are my rambling notes...\r : \r : There were several talks on the status and future of AI. Fausto argued that strong/general AI had failed whereas weak/narrow AI was a success. Others felt that weak/narrow AI was the best route to strong/general AI, and so there was no failure, merely a steady advance.\r : \r : The general consensus was that AI is in rude health. It has gone from a quirky academic subject to being a multi-billion industry, powering many sectors - such as search (e.g. Google), eLearning, medical imaging... As several people observed/complained, when an AI technology starts working, people no longer call it AI - it becomes computer science.\r : \r : Fausto raised some good points about the anthropomorphic conception of AI as "computers that think like people". AIs will not be human like (to quote Marvin Minsky "the first AI will be crazy."), and there will not even be a clear boundary between the AI and the world. We humans think in terms of inside (the brain, the machine) versus outside (the world), but this is not the case for modern computer systems, where much of the intelligence may reside in the environment.\r : \r : Jorg Siekmann gave a presentation on eLearning and how sophisticated it is becoming. New lecturing tools are going to transform the humble blackboard. For example, with handwriting recognition for formulae, which can then be plotted on the blackboard. In medicine, there is work on projecting a 3D model of a beating heart onto the blackboard, which the lecturer can then open up with their hands. Incredible stuff.\r : \r : Jorg considers AI as a key tool in delivering sophisticated personalised computing. Diversity is not a defect to be absorbed by design, but a feature to be maintained, encouraged and exploited through run-time adaptation.\r : \r : He also talked about his conversion to AI. He came to Britain from Germany not believing that machines could do calculations but not real mathematical reasoning. He was shown a theorem prover, and was converted on the spot. He wired his fiance and family in Berlin to say "This is too deep. I'm staying in Sussex." and never looked back.\r : \r : Elsewhere, Alan Smaill gave a retrospective on Alan Turing, Mateja talked on diagrammatic reasoning, but that will require a separate post as I may have quite a bit to say, and Chris Mellish - who apparently was Alan's first PhD student - talked on the links between language and logic in NLP.\r : tags: group:frontpage ai edinburgh Is Ruby on Rails better than Java? | Someone just emailed me the following Ruby on Rails propaganda. The claim is that Ruby will deliver faster & cheaper than Java. You may have heard this or similar claims before.\r : \r : > So to change a database schema, say to add a column, in Java would be, say, a week, in PHP would be 2 - 3 days. In RoR it's probably a minute.\r : \r : Hm... adding a column to a database schema takes me about a minute in Java too - using modern Java tools to be sure (e.g. JPA+Hibernate - just add a field, and possibly an annotation, and the job's done). And the compile testing, unit-testing and debugging tools which Java gives you means that overall, a good Java coder should be able to knock out reliable apps faster than a Ruby on Rails coder.\r : \r : I am not at all convinced by the claims for Ruby. Too much bluster and attacking straw men. Certainly there are slow Java coders. The solution is to find better coders. Keyword spam | \r : \r : \r : There's a nice funny article in today's Guardian about the use of popular but irrelevant keywords to attract eyeballs, the resulting decline of journalism, and the concomitant end of the civilised world:\r : \r : ["Online POKER marketing could spell the NAKED end of VIAGRA journalism as we LOHAN know it"](http://www.guardian.co.uk/commentisfree/2008/jul/21/charliebrooker.pressandpublishing) by a hungover but nevertheless very articulate Charlie Brooker.\r : \r : He ends with a rant which I completely agree with:\r : \r : > And for the consumer, it's just one more layer of distracting crud - the bane of the 21st century. Distracting crud comes in countless forms - from the onscreen clutter of 24-hour news stations to the winking, blinking ads on every other web page. These days, each separate square inch of everything is simultaneously vying for your attention, and the overall effect is to leave you feeling bewildered, distanced, feverish and slightly insane. Or maybe that's just me, today.\r : Error posting to platypusinnovation.com | Thread 87262 does not exist\r : \r : The message concerned was\r : \r : This comment ought to show up in the error message.\r : \r : -- \r : The progression of a Lisp programmer: The newbie realizes that the\r : difference between code and data is trivial. The expert realizes that\r : all code is data. And the true master realizes that all data is code.\r : -- Sriram Krishnan\r : Error posting to platypusinnovation.com | The group mcguffins does not exist.\r : \r : The message concerned was\r : \r : Only registered users of platypusinnovation.com may post via email.\r : \r : The message concerned was\r : \r : This comment ought to show up in the error message.\r : \r : -- \r : The progression of a Lisp programmer: The newbie realizes that the\r : difference between code and data is trivial. The expert realizes that\r : all code is data. And the true master realizes that all data is code.\r : -- Sriram Krishnan\r : Error posting to platypusinnovation.com | The group mcguffins does not exist.\r : \r : The message concerned was\r : \r : Thread 87262 does not exist\r : \r : The message concerned was\r : \r : This comment ought to show up in the error message.\r : \r : -- \r : The progression of a Lisp programmer: The newbie realizes that the\r : difference between code and data is trivial. The expert realizes that\r : all code is data. And the true master realizes that all data is code.\r : -- Sriram Krishnan\r : Error posting to platypusinnovation.com | The group mcguffins does not exist.\r : \r : The message concerned was\r : \r : Thread 87262 does not exist\r : \r : The message concerned was\r : \r : This comment ought to show up in the error message.\r : \r : -- \r : The progression of a Lisp programmer: The newbie realizes that the\r : difference between code and data is trivial. The expert realizes that\r : all code is data. And the true master realizes that all data is code.\r : -- Sriram Krishnan\r : Could concrete examples improve mathematical syntax? | \r : \r : \r : You do not need to work very long in fields such as statistics or logic before realising that the notation can get pretty impenetrable. Recursive structures, such as distributions over distributions, and the need to carry around context information quickly lead to confusingly similar variables and a thicket of indices.\r : \r : This is more than just a nuisance. It is actually something of a block to understanding and communication. But what can be done?\r : \r : One possibility is to reason with concrete examples. Mateja Jamnik's work on diagrammatic reasoning looks at this, as does my own research. Diagrams typically have concrete features - things such as size and position which are unavoidably specified by the act of drawing. Yet (as we have shown) it is possible to perform rigorous reasoning with them, even to the extent of producing formal theorem provers.\r : \r : Could the use of "concrete syntax" be extended to algebra - and thus to more general domains? \r : \r : Concrete syntax may also be the key to programming by example (something I have mused on elsewhere). Sig stripper test | There should be no sig here.\r : \r : -- \r : Programs that write programs are the happiest programs in the world.\r : -- Andrew Hume\r : Lost soul | \r : \r : by Daniel Winterstein published 15 March 2009\r : \r : I have been thinking on the issue of how to make an ARG that is fun and meaningful. Bolting casual games onto a TV-style narrative is missing the value of the genre. But on the other hand, if the puppet-masters follow the "This Is Not A Game" aesthetic too closely, then the difference between game and lunatic conspiracy theory becomes a matter of the player's perspective.\r : \r : On that note, here is an extract from a lost soul on a forum:\r : \r : I spent a large portion of 2008 looking into conspiracy theories \u0096 I had a lot of time off, so I decided to indulge my curiosity...\r : \r : It probably wasn't the best move for someone having mental health issues but the more I looked, the more convinced I became and the more compelled I was to continue delving deeper.\r : \r : I suddenly found myself involved with an activist group called the Earth Defense Federation... I then got sucked into what at first seemed a bit of an online game...\r : \r : They stated they came from an unmapped secret country in South America and were interested in conscripting me into their Imperial Military, talked about WW3 and various other conspiracies and archaic cultural references.\r : \r : My point is, I can see how human beings want something far more spectacular than the humdrum routine of work, sleep, eat, that we have set ourselves up for... For a moment I was seriously drawn in and believed the hype but then I had to pinch myself.\r : \r : I'm still very confused about it all, but now I try to take it with just a bit more than a pinch of salt.\r : \r : Having looked at the source material, it's not clear whether this man found a grassroots ARG or a collection of nutters.\r : \r : Certainly no responsible game maker would let things be so unclear. Puppet-master's have a responsibility to act with care. Failure to do so is unethical.\r : \r : But beyond ethical responsibilities, there is also good game design at stake. Game designers have to communicate effectively with the players, making clear the nature and rules of the game. Failure to do so leads to a lousy game, with players potentially becoming obsessive but mostly just dropping out.\r : \r : Lines should be established, respected, and broken only in precise and careful ways.\r : tags: games arg this-is-not-a-game The World is Your Playground (EIF talk on ARGs) | \r : \r : by Daniel Winterstein published 27 November 2008\r : \r : Daniel Ilabaca cc Jon Lucas\r : \r : Somewhat belatedly, here are the slides from a presentation I gave at the Edinburgh Interactive Festival on Alternate Reality Games (ARGs) past, present, and future:\r : \r : http://www.platypusinnovation.com/static/eif/\r : \r : The short version: if a movie is a world that the audience visits, then an ARG is a world that visits the audience.\r : tags: group:frontpage arg games presentation Java: Still the best language for work | Java: Still the best language for work\r : \r : \r : \r : Like many hackers, I am a coding polyglot.\r : \r : Java Logo\r : \r : The first language I learned was BASIC. It came already installed on my parent's Amstrad 128 (the 128 refers proudly to the machine's 128k memory). That wonderful and unexpected toy is to blame for a lot! After that I graduated to C for writing games which I sold by shareware (posting the disks out - this was pre-internet). I also wrote some assembly code for speed-ups.\r : \r : As an undergraduate, I used Pascal and C++. As a web-developer, I came to appreciate the power of Perl for quickly getting things done, and started experimenting with Javascript. As a postgraduate I learned Lisp and Prolog. I would go on to teach Prolog and Javascript, and to use Lisp for my Ph.D.\r : \r : I also use C# at work, and I can code in Python, PHP, Matlab and Erlang. Once you've learned one programming language, it's usually easy enough to pick up others. The same core concepts tend to re-occur in different guises.\r : \r : All these languages have good points and bad points. But if I have a choice of coding language, I will without question choose Java. Using Java, I can write both simple and complex programs faster and with less bugs.\r : \r : Why? Partly because Java made some good design choices. Partly because there are good libraries for many tasks (with some notable exceptions). Mostly because the Java IDEs are superb.\r : \r : I am reminded of this because lately I have been coding in C# using Visual Studio. Now C# is very close to Java - so much so that Microsoft have produced an automatic Java-to-C# converter (though it only handles the old Java 1.4). You might expect the C# coding experience to be similar to the Java experience. But it is much slower and clunkier - and this is entirely due to shortcomings of Visual Studio.\r : \r : I don't mean to knock Visual Studio. It is a relatively good IDE. It is one of the best IDEs available - if you ignore the Java IDEs (and Allegro's Lisp IDE, which was a decade ahead of its time).\r : \r : Eclipse is packed with powerful tools that work with you. Three areas stand out as being head and shoulders above Visual Studio:\r : \r : 1. Code-completion tools\r : 2. Continuous integration support (i.e. continuous compilation with error feedback, and good built in testing support)\r : 3. Navigation tools\r : \r : At a rough estimate, I would say I am 3-4 times more productive in Java/Eclipse than C#/Visual Studio.\r : Why the error? | Miles\r : \r : -- \r : I'm trying to cut down on my caffeine, so when I get up, I have one cup\r : of coffee, then I have another cup of coffee with my breakfast, then on\r : the way to work, I like to get a cup of coffee. Then I have a cup of\r : coffee, you know, the kind you get with a donut, except I never have the\r : donut, I just have the cup of coffee. Then when I get to work, I have a\r : cup of coffee, because I like to have coffee when I talk on the phone,\r : except it usually goes cold, so I get another cup of coffee, and then\r : it's lunch, so I have an espresso.\r : -- Jim's Big Ego, "Stress"\r : Tips for writing FaceBook applications in Java | Tips for writing FaceBook applications in Java\r : \r : by Daniel Winterstein published 15 July 2008\r : \r : This tutorial is now a bit out of date. I intend to write an update soon...\r : \r : So you want to write a FaceBook application using Java? You've added the Developer application to your FaceBook account and downloaded the Java client library. And now you're kind of stuck. Where are the tutorials, examples and proper documentation? Frustration turns to anger, which as we know leads to the dark side.\r : \r : "I find the lack of good Java\r : documentation disturbing."\r : \r : Hopefully someone will right a good Java/FaceBook tutorial soon. Perhaps we will if you'd only send us some nice chocolates - though to be honest we're still figuring it out ourselves. Meanwhile, here are some tips to help get you started. It's not a tutorial, but it should set you on the right track. 8 tips for building Java/FaceBook apps\r : \r : 1. Use Java 5.0 or higher. That's just general advice for your health and sanity. Because we care.\r : \r : 2. Ignore the java library example. \r : The example that ships with the java library is for a desktop app. You probably want to write a web application, so you'll have to change a great deal. For a start you'll need some form of web-application server. E.g. you might use TomCat. FaceBook web-applications have a slightly strange usage pattern. Most of your pages will be served through FaceBook. The user will request a page from FaceBook, who will then request it's main contents from your server. Your contents will be adapted before being sent back to the user. Mostly you don't need to worry about this - it works nicely. But be aware that JavaScript is not allowed That means if you're using an AJAX platform, it won't work within FaceBook. If you need AJAX - and FaceBook's mock-ajax won't do - then use your FaceBook pages to direct users off of FaceBook onto normal web-pages.\r : \r : 3. The settings for your application in FaceBook \r : Don't forget to fill in the settings for your application in the Facebook Developer application. You should be setting:- \r : \r : 4. The callback URL for your application.\r : \r : 5. The application name - this allows you to make pages within FaceBook (i.e. pages which are framed by the FaceBook navbar and can use FBML). Once set, urls such as http://apps.facebook.com/yourappname/yourpagename will generate a request from facebook to your server.\r : \r : 6. A URL for new users. This is where you get to welcome them to your application and ask them to spread it on.\r : \r : 7. Using FacebookRestClient \r : The most important class in the client library is FacebookRestClient. This contains a host of methods which call the FaceBook server and cover most of what you'll want to do. Unfortunately FacebookRestClient isn't the friendliest of classes.\r : \r : 8. Almost all the actions require a FacebookRestClient that was constructed with a session key. If the user is logged in, you can get the session key from the CGI variables (look for FacebookParam.SESSION_KEY.toString()). Otherwise, you need to send them to a login page. Try the following:-\r : \r : \r : // Create a session-less FacebookRestClient\r : FacebookRestClient client = new FacebookRestClient(YOURAPIKEY, YOURSECRETKEY);\r : String token = client.authcreateToken();\r : String loginURL = "http://www.facebook.com/login.php?v=1.0&apikey="\r : +YOURAPIKEY+"&auth_token="+token; \r : // Now redirect the user to loginURL\r : // When they come back, they should have a session key\r : \r : \r : The method `FacebookRestClient.auth_getSession()` is - as far as I can tell - unnecessary. It converts a session-less client into one with a session. I found it easier just to make a new client, pulling the session information from the CGI variables.\r : \r : * Having got a FacebookRestClient with a session key, you can now call the various FaceBook-editing methods that FacebookRestClient provides. These methods make calling FaceBook easy enough. Unfortunately the methods return unprocessed XML Documents, which isn't terribly helpful. E.g. friends_get() returns something like this:-\r : \r : \r : \r : 1\r : 2\r : 3\r : \r : \r : \r : You'll probably want to wrap the methods you use with code to extract the information. E.g. to actually use friends_get(), try this:-\r : Document d = client.friends_get();\r : NodeList userIDNodes = d.getElementsByTagName("uid");\r : int fcount = ids.getLength();\r : List friends = new ArrayList();\r : for(int i=0; i "I like to roll in the hay, unless you mean slang for sexual intercourse."\r : \r : If a normal person says that, it just sounds weird. But I can imagine Dougal, the idiot-priest from [Father Ted](http://www.channel4.com/programmes/father-ted) saying that - and it being very funny. \r : \r : > "I hate this ducking fouchebag"\r : \r : > "I like ovens, shall we have a bun in the oven?"\r : \r : > "I like daisies. Let's push up daisies."\r : \r : You can see the intention, though it doesn't really work. But change the delivery and we have a credible (if not brilliant) joke: \r : \r : > "You like daisies, right, you old bat? Why don't you go push up some daisies."\r : \r : Similary, "cat house got your tongue?" isn't bad - provided you know a *cat house* is a brothel. But it really wants to be used in an appropriate setting. Delivered as a retort by an angry wife to a husband who has just been caught using brothels.\r : \r : Jonas also wants to make a farting robot. The man is a genius. What is reality? Mathematics and cosmology | \r : \r : *Originally published in The Scotsman newspaper 6th October 2008 and ©yright; The Scotsman*\r : \r : This week marked the end of Sir Michael Atiyah's presidency at the Royal Society of Edinburgh (RSE). Michael Atiyah is one of the world's most eminent mathematicians. His work has been influential on modern theoretical physics. He used his leaving speech at the RSE to pose deep questions: What is reality? Is the universe built using the "fantastically intricate mathematics" of String Theory? Or do we await a new Newton or Einstein to simplify the picture?\r : \r : Although 79, Sir Michael Atiyah remains full of energy. Born in England to a Lebanese father and Scottish mother, he now lives in Scotland where he holds an honorary professorship at the University of Edinburgh. Atiyah made profound connections between research in advanced geometry and the physics of Quantum Mechanics. He is a mathematician rather than a physicist, but has worked closely with leading physicists in Quantum Mechanics and String Theory. He has received numerous awards and honours, including the Fields Medal in 1966 - the mathematics equivalent of a Nobel prize - a knighthood in 1986 and the prestigious Order of Merit in 1992.\r : \r : The main issue in modern physics is how to resolve two conflicting theories: the Standard Model of Quantum Mechanics, and Einstein's theory of General Relativity. Quantum Mechanics describes the very small - an exotic zoo of sub-atomic particles. These can be in several states at once, leading to the strange quandary of Schrodinger's cat who is both alive and dead at the same time. Meanwhile, General Relativity describes the very big - with the force of gravity acting as a distortion of time and space. Both theories have been thoroughly tested in experiments, but attempts to combine them are problematic. This has set physicists since Einstein searching for the elusive "Theory of Everything", a theory which will explain both Quantum Mechanics and Relativity as two views of one underlying truth. String Theory is the best candidate for that.\r : \r : String Theory considers all particles to be made up of tiny vibrating strings. The length and vibrations of these strings determine their behaviour, like notes interacting in a guitar chord. Bizarrely, the theory suggests that space has more dimensions than we commonly observe. We are aware of 3 dimensions of space, and time is counted as the fourth dimension. But attempts to apply String Theory in 4 dimensions lead to problems. So if String Theory is correct, there are other dimensions that must be invisible to us. Scientists believe the extra dimensions exist as tiny subatomic tubes.\r : \r : The problem with String Theory is that it is not one clean theory, but a family of possible theories. There are no experiments to test which of these are correct - if any. The new giant collider at CERN may find evidence that lends weight to String Theory, but it is not expected to settle the question. However progress is being made. Led by American professor Edward Witten, physicists and mathematicians are discovering that what they thought were completely different theories, are in fact different ways of looking at the same theory.\r : \r : Atiyah pointed out that there can be many "correct" ways to view the world. As an example, consider a simple stone. We see a stone as a lump of solid rock. To a chemist, it is a collection of atoms. To a physicist, those atoms are mostly empty space criss-crossed by force-fields. To a mathematician, those fields are a set of equations. Perhaps, Sir Michael suggested, the equations are the true reality.\r : \r : Though he helped develop modern theories, Atiyah does not believe they are the final answer. We may be making the same mistake as the ancient Greeks, who thought the planets moved in circles. Since this did not match observations, the Greek astronomoer Ptolemy added extra circles rolling on the first circles (epicycles), and then extra circles again were added - wheels within wheels. The result explained the experimental data of the time, but it was overly complicated - and wrong. That model was eventually blown away by Johannes Kepler and Isaac Newton's insights, which revealed a simpler more elegant world.\r : \r : When Atiyah was a young man, the great logician Kurt Godel complained to him that physicists had given up on understanding, and were now focused only on description. Michael Atiyah has striven for understanding, following Poincare's view that science is no more a collection of facts and proofs, than a house is a collection of bricks. Atiyah is still hopeful that behind the complexity of current String Theory there may be revealed a universe we can understand.\r : tags: physics maths Url patterns in Java servlets | \r : \r : This technical mini-tutorial covers how to set url-patterns in a J2EE config. This lets you pass variables in your path, which can produce more meaningful URLs.\r : \r : The url-pattern element is a vital part of your `web.xml` file. It's also a tricky one.\r : The pattern begins from your web-app context. So `/myservlet` will match `http://myserver.com/mywebapp/myservlet`\r : \r : The example pattern above, /myservlet, is exact. It will not match /myservlet/foobar. It will match /myservlet?a=b though (i.e. variables passed by GET are OK).\r : \r : If you want flexible matching you use *, but the usage is a little strange:\r : \r : * `/myservlet*` will perform an exact match (why? what were they smoking when they decided that?)\r : * `/myservlet/*` is what you want. This will match /myservlet, /myservlet/, /myservlet/foobar, etc.\r : * You can also match by filetype using patterns such as *.html\r : * All other uses of * will will not work - they are interpreted as the character * itself.\r : \r : It's a unique pattern globbing system, providing neither power nor flexibility. Still it does the job.\r : \r : There are more details on url-pattern here: . An alternative approach to producing elegant meaningful urls is to use apache's rewrite system. Test File Upload By Email | Who can tag what? | \r : Currently:\r : You need write access to an object in order to tag it.\r : So user A cannot, in general, tag user B\r : \r : Common use case:\r : User A wants to tag user B.\r : User A wants to tag user B's tweet.\r : \r : Not sure how to fix this!\r : Ideas?\r : \r : Note: currently, shadows are owned by whoever creates them, which works for RtE 'cos one user does everything.\r : \r : \r : The Scottish Government backs technology to take video games into the real world | \r : \r : Choices cc Slava Kozlov\r : \r : Edinburgh-based firm Winterwell Associates has been awarded a SMART award to design artificial intelligence (AI) tools for the creative industries. The SMART:Scotland programme gives grants to small and medium sized enterprises to support R&D projects representing a significant technological advance for UK industry.\r : \r : Founded by computer experts, Dr Joe Halliwell and Dr Daniel Winterstein, Winterwell provides research, development and consultancy on new media, web services, mathematical modelling and data mining. The SMART award will enable the company to develop intelligent technology for a new type of media: pervasive games - also known as alternate reality games (ARG), interactive drama, or more simply: adventures.\r : \r : Like video games, pervasive games are highly interactive and story driven. But there's a key difference. "In a computer game or a TV show, you visit another world. In a pervasive game, another world comes to you!" says Winterstein who like Halliwell holds a PhD degree from the University of Edinburgh. "New technology allows us to take the real world as the arena. Our characters will have Facebook pages, they'll text and email players - even meet with some of them." The result is to bring a drama or advertising campaign very much to life.\r : \r : Though thrilling for players, game creators have to work round-the-clock to breathe life into their characters. "Time costs and scalability are key problems, and that's where software can help," explains Halliwell. "Our systems will understand plot devices and characters. And they'll take the drudgery out of talking with players across different media platforms. Scotland is a world leader in video games. We believe it can become an international center for pervasive gaming too."\r : \r : The creative and digital content industries are worth £5 billion to the Scottish economy and seen by many as crucial to the nation's future development. In a report published last week, innovation quango Nesta estimates they could create 150,000 jobs across the UK and deliver an £85 billion boost to the economy by 2013.\r : \r : *The photo above is (c) Slava Kozlov used under creative commons.* UTF-8 and multibyte characters | Currently, multibyte characters (as downloaded from Twitter, say) : are transmitted as question marks. This isn't a font issue, they're : transmitted across the wire as ASCII question marks. : : Changing the `encoding` field in APluginServlet to "UTF-8" means we can : display and edit CJK characters in TwitterSearchServlet. However, : inputting wide chars as a blog post or comment doesn't work: it looks : like the characters aren't entered into the database correctly. Pound : signs are also stored incorrectly. Comments in AField.java suggest this : may be a GET/POST issue: TwitterSearchServlet uses GET, TextEdit uses : POST. : : Changing the "encoding" field in PageBuilder doesn't seem to do : anything. Changing the "encoding" in the default header template : likewise does nothing. : : I'll revert the change to APluginServlet. : : Miles SoDash Demo | Here are the things I would like to be in place prior to demoing. These are roughly in order of priority.\r : \r : - Smallish presentation tweaks\r : - Suppress cryptic tags\r : - Highlight translation\r : - Fix colourscheme\r : - Link from twitterSearch to twitterTrends\r : - Ajaxify search operations and show a nice splash screen (ideally giving user feedback about what's going on)\r : \r : - Fix charting\r : - Use meaningful buckets\r : - Classify retrospectively?\r : - Make it look useful\r : - (Maybe: add some stats e.g. number of tweets in period, classification split)\r : \r : - Improve (?) classifier\r : - Train it on the happy/sad corpus\r : Javscript in PuppetStrings | Javascript embedded in string literals is pretty awful.\r : \r : I've added a an addScript() method to RequestState which allows you to add to an ordered set of javascripts that get squirted into the page by APluginPageServlet. See comments there for some more musings.\r : \r : Javascripts then just live off in an appropriate subdirectory of /web/static/code:\r : \r : /web/static/code/widgets/widgetCamelCase.js\r : /web/static/code/servlets/servletCamelCase.js\r : \r : Examples of use in code: APlayerInfoWidget.java, DashboardServlet.java What does Britain do for money? UK exports 2007 | \r : \r : After the financial turmoil of last year, 2009 opens with bleak prospects for the UK economy. In the face of massive losses it behoves us to ask, how does Britain make money? I.e. if you viewed Britain as a business, where does its revenue come from? This is a big question, and I can only present a brief answer here - an overview of British exports.\r : \r : It is easier to find charts of GDP. But GDP combines actual income with internal sales, so it reveals what the British do for money, but not what the UK itself does for money.\r : \r : This chart shows gross UK exports of both goods and services, plus net income from investments - that is, it shows everything that leads to money entering the UK. It is based on 2007 figures from the National Office of Statistics, published in the 2008 "Pink Book". It doesn't consider imported inputs, which has the effect of inflating the apparent size of some sectors.\r : \r : Overall we sell more goods than services. That said, we import a lot more goods than we export - we have a modest trade surplus in services, and a worrying trade deficit in goods (not shown here - see the Pink Book for details).\r : \r : Manufacturing is perhaps more important than you might realise. The UK's manufacturing sector has certainly shrunk dramatically over the last decade/century, but it still constitutes the largest sector of the country's income. (Note that the importance of manufacturing will have been inflated somewhat here by the lack of input-output analysis.).\r : \r : After manufacturing, a worryingly large slice of the pie is made up of financial services. Worrying because this is likely to take a pounding in the coming years.\r : \r : The surprise for me was that chemical production is a very large part of the basis of our economy. The chemical industry doesn't receive as much press attention as manufacturing or finance - perhaps because it isn't going down the toilet except when intended.\r : \r : A quick analysis of the Pink Book shows that UK PLC is not in great shape: we have been running a trade deficit during the boom period, which will be unsustainable in the coming recession.\r : \r : \r : UK Exports 2007 £million\r : Misc. manufacturing 81532\r : Financial 43055\r : Misc. business services 40613\r : Chemicals 38865\r : Capital goods 29825\r : Oil 22749\r : Travel 18826\r : Transportation 16447\r : Car manufacturing 14284\r : Food, drink & tobacco 11759\r : Income: net return on investment 8606\r : Ships & aircraft 8291\r : Royalties & licence fees 7401\r : Computers & IT 6834\r : Insurance 5529\r : Basic materials 5519\r : Precious stones & silver 4768\r : Communications 3884\r : Government services 2084\r : Cultural & recreational 2054\r : Coal, gas & electricity 1942\r : Misc Commodities 1169\r : Construction 907\r : \r : \r : \r : Group page not showing emails | Emails to e.g. group-code-monkeys@pi.com are being received and filed correctly, but are not shown on the group page. (Nor are they causing mails to be sent.) Unpublish mechanism on textEdit uses user-specified service instead of original service #bug | The unpublish mechanism on textEdit should unpublish via the original\r : service of the text (not the service specified by the service\r : dropdown). The Bachinator goes online | *from 21 November 2008, salvaged from a previous blog installation*\r : \r : \r : \r : Winterwell announce the release of the online version of HMM Bach, developed for the University of Edinburgh. HMM Bach is a demo in computer creativity, i.e. computers using artificial intelligence to be creative. HMM Bach works with a human. The human gives it a melody, and it will compose a chorale in the style of J.S.Bach.\r : \r : The system has analysed Bach's chorales for statistical patterns. It's knowledge of music is captured in two Hidden Markov Models (HMM). Using these models, it can now perform Bach-like harmonisation.\r : \r : Try it out: [HMM Bach](http://demos.inf.ed.ac.uk:8188/web/)\r : \r : HMM Bach is a nice example of data-analysis for an innovative purpose - which is what Winterwell do as AI and statistical consultants.\r : \r : \r : Request For Ideas: Classifiers | : Hello Joe & Miles, : : This email is somewhere between a brain dump, a set of questions, and a : rough plan of action. : : Automation is a key selling point for Sodash. That breaks down into two : types: rule-driven automation on the back of the event model, and : automatic classification ("proper AI"). Tagging is core - it forms the : basis for classification, assignment, and triggering many rules. : : So how to do classification? Below are my rough notes (not necc either : intelligent or intelligible). : : : Goals: : : - Flexibility on tags: users should be able to change the tags and : tag-sets that they use. : - We want to combine tags. E.g. happy + spam is a valid tagging, as is : angry + @support : - Some tags are mutually exclusive though (happy/sad) : - We want to both detect and use conversational context, eg treat : replies differently to initiating messages. : - User-specific learning (e.g. what does tag X mean for this user?) : - Handle disparity in the size of training sets. The 500k messages in : happy/sad should not swamp new tags. : : : Resources: : : - The happy/sad corpus : - rte spam : - User-created tags (we may wish to hire a student as a test user to : increase these). These divide in two: tags applied in the current : subdomain, and equivalent tags applied by other users. : : : Suggested approach: : : - Data model: Tags are namespaced by group/subdomain. There are : mutually exclusive tags, which are defined by tagsets (see the new : sketch class DBTagSet). Users/subdomains can have several of these. : A tagset implicitly includes "none of the above". : A tag T which is not in a tagset is implicitly in a tagset of {T, not-T} : : - When to do classification: currently it's done in TwitterSearchProbe : in sodash, and in handlers in RtE. : This is OK for now, but ties the process to a particular workflow. We : should look to do it in a more general manner. : We could do classification within store() - but the need for speed there : might limit the types of classifier we can use. : Or we could have a classification thread which has unclassified texts : added to it by store(). : Or we could use an event-handler. : : - Collecting user data: certain actions should prompt us to mark a tag : as user-applied (via the DBTag.tagger field). E.g. If the user chooses : the tag, but also if the user writes a reply (which can be taken as : tacit acceptance of the automatic tag). DBTag should probably have an : extra Boolean field called trainingData. : : - Classifiers are updated periodically, e.g. daily or weekly. : : - We should probably experiment with a range of classifiers. To that : end, we should improve our ability to test the performance of : classifiers. This has to be done in the context of user data. : Should test running and reporting be added to ClassifierServlet? Or : should we work by taking database snapshots then running local : performance tests? : : Let's move to classifiers that output a distribution of tags, e.g. that : output {spam:70% ham:30%} rather than simply "spam" or "ham". This will : make combining classifiers easier, plus give us more insight when : debugging. Of course, we only a whole 100% of a tag - but DBTag does : have a confidence value for logging how confident the classifier was. : : - Speaking of confidence, we can't take the classifier's output at : face value. E.g. Naive Bayes has a tendency to be over-confident (it can : effectively double count evidence due to the false-but-convenient : assumption of independence between words). But we can also measure the : historical accuracy of a classifier, then judge confidence based on a : combination of both. I'm not sure what the formula for this is, but we : can certainly cobble something plausible together. : : - It would also be good to have live debugging info. E.g. when we (as : debuggers) are looking at a page of tweets, we see info about why the : tags were applied. E.g. "@spoon Buy some Viagra" {spam:70%} : : - We want to combine classifiers: e.g. happy/sad + general-users + : specific-user. This suggests a classifier which works by taking a : weighted sum of the output from other classifiers. : : Re. how to calculate the weightings... a simple 1st pass is to weight : each classifier by the % of tags it's guessed correctly historically, : with some pseudo-counts added to avoid 0%/100%. : : - For conversations, we can segment the training data by the tags : applied to the previous message. We need to judge when there's enough : data to use a segmented approach. : : - In terms of improved classifiers... I'd like to try some Markov : models. However I think the best immediate improvement we can deliver is : a classifier that follows links. E.g. given the tweet "Read this : http://bit/ly/whatever", it will follow the link, annotate the tweet : with the extracted text, then classify on the basis of both tweet text : and webpage text. The tricky thing here could be extracting the right : chunk of text from a webpage. : : Please add your thoughts and comments to this rambling thread! Then lets : have a chat about short medium and long term goals for tagging. : : Best, etc. : - Dan Request For Ideas: Classifiers | : Hello Joe & Miles, : : This email is somewhere between a brain dump, a set of questions, and a : rough plan of action. : : Automation is a key selling point for Sodash. That breaks down into two : types: rule-driven automation on the back of the event model, and : automatic classification ("proper AI"). Tagging is core - it forms the : basis for classification, assignment, and triggering many rules. : : So how to do classification? Below are my rough notes (not necc either : intelligent or intelligible). : : : Goals: : : - Flexibility on tags: users should be able to change the tags and : tag-sets that they use. : - We want to combine tags. E.g. happy + spam is a valid tagging, as is : angry + @support : - Some tags are mutually exclusive though (happy/sad) : - We want to both detect and use conversational context, eg treat : replies differently to initiating messages. : - User-specific learning (e.g. what does tag X mean for this user?) : - Handle disparity in the size of training sets. The 500k messages in : happy/sad should not swamp new tags. : : : Resources: : : - The happy/sad corpus : - rte spam : - User-created tags (we may wish to hire a student as a test user to : increase these). These divide in two: tags applied in the current : subdomain, and equivalent tags applied by other users. : : : Suggested approach: : : - Data model: Tags are namespaced by group/subdomain. There are : mutually exclusive tags, which are defined by tagsets (see the new : sketch class DBTagSet). Users/subdomains can have several of these. : A tagset implicitly includes "none of the above". : A tag T which is not in a tagset is implicitly in a tagset of {T, not-T} : : - When to do classification: currently it's done in TwitterSearchProbe : in sodash, and in handlers in RtE. : This is OK for now, but ties the process to a particular workflow. We : should look to do it in a more general manner. : We could do classification within store() - but the need for speed there : might limit the types of classifier we can use. : Or we could have a classification thread which has unclassified texts : added to it by store(). : Or we could use an event-handler. : : - Collecting user data: certain actions should prompt us to mark a tag : as user-applied (via the DBTag.tagger field). E.g. If the user chooses : the tag, but also if the user writes a reply (which can be taken as : tacit acceptance of the automatic tag). DBTag should probably have an : extra Boolean field called trainingData. : : - Classifiers are updated periodically, e.g. daily or weekly. : : - We should probably experiment with a range of classifiers. To that : end, we should improve our ability to test the performance of : classifiers. This has to be done in the context of user data. : Should test running and reporting be added to ClassifierServlet? Or : should we work by taking database snapshots then running local : performance tests? : : Let's move to classifiers that output a distribution of tags, e.g. that : output {spam:70% ham:30%} rather than simply "spam" or "ham". This will : make combining classifiers easier, plus give us more insight when : debugging. Of course, we only a whole 100% of a tag - but DBTag does : have a confidence value for logging how confident the classifier was. : : - Speaking of confidence, we can't take the classifier's output at : face value. E.g. Naive Bayes has a tendency to be over-confident (it can : effectively double count evidence due to the false-but-convenient : assumption of independence between words). But we can also measure the : historical accuracy of a classifier, then judge confidence based on a : combination of both. I'm not sure what the formula for this is, but we : can certainly cobble something plausible together. : : - It would also be good to have live debugging info. E.g. when we (as : debuggers) are looking at a page of tweets, we see info about why the : tags were applied. E.g. "@spoon Buy some Viagra" {spam:70%} : : - We want to combine classifiers: e.g. happy/sad + general-users + : specific-user. This suggests a classifier which works by taking a : weighted sum of the output from other classifiers. : : Re. how to calculate the weightings... a simple 1st pass is to weight : each classifier by the % of tags it's guessed correctly historically, : with some pseudo-counts added to avoid 0%/100%. : : - For conversations, we can segment the training data by the tags : applied to the previous message. We need to judge when there's enough : data to use a segmented approach. : : - In terms of improved classifiers... I'd like to try some Markov : models. However I think the best immediate improvement we can deliver is : a classifier that follows links. E.g. given the tweet "Read this : http://bit/ly/whatever", it will follow the link, annotate the tweet : with the extracted text, then classify on the basis of both tweet text : and webpage text. The tricky thing here could be extracting the right : chunk of text from a webpage. : : Please add your thoughts and comments to this rambling thread! Then lets : have a chat about short medium and long term goals for tagging. : : Best, etc. : - Dan Re: Request For Ideas: Classifiers | On Mon, Oct 26, 2009 at 08:49:46AM +0000, Daniel Winterstein wrote: : > - In terms of improved classifiers... I'd like to try some Markov : > models. However I think the best immediate improvement we can deliver is : > a classifier that follows links. E.g. given the tweet "Read this : > http://bit/ly/whatever", it will follow the link, annotate the tweet : > with the extracted text, then classify on the basis of both tweet text : > and webpage text. The tricky thing here could be extracting the right : > chunk of text from a webpage. : : Better: download the page (and possibly strip HTML from it) and store it : as a DBText, findable by URL: then we avoid having to re-download and : re-classify the same page over and over again when all of stephenfry's : followers re-tweet it. : : Miles Object Oriented Statistics: Death to Indices! | Object Oriented Statistics\r : \r : Statistical notation is not ideal.\r : \r : You do not need to work very long in statistics before realising that this is more than just a nuisance. It is actually a block to understanding and communication.\r : \r : \r : P(X_{it}=x | X_{jt}=x', S_{XYt} = s_{xy})\r : \r : \r : ## Meaningful variable names\r : Lift ideas from software engineering. \r : The first being to use meaningful variable names.\r : E.g.\r : P(X.location=x | )\r : \r : Going beyond this, we can cast the problem to a specific\r : Sort of like skolemisation.\r : \r : assign names to our objects\r : P(alice.location | bob.location, alice )\r : \r : This is all very well, but the downside is that a complex\r : formula will fill several lines.\r : \r : So we drop bits of it.\r : \r : Contrary to Einstein's scheme for making physics equations\r : prettier, we probably want to keep the summation signs.\r : Probability theory has as many product signs as sums, so\r : repeated indices would not be such a clear indicator of a sum.\r : \r : Allow a context to be built up through a paper.\r : If we've said \forall x \in \R once, it doesn't need to be\r : repeated. \r : This is normal practice anyway.\r : \r : ## Priors, posteriors and marginals\r : \r : A particulat\r : \r : # Death to indices [Fwd: #bug Sodash outbox contains messages from other people] | : : -------- Original Message -------- : Message-ID: <4B0D808E.60005@winterwell.com> : Date: Wed, 25 Nov 2009 19:07:58 +0000 : From: Daniel Winterstein : Reply-To: daniel@winterwell.com : User-Agent: Thunderbird 2.0.0.23 (X11/20090817) : MIME-Version: 1.0 : To: group-code-monkeys@platypusinnovation.com : Subject: #bug Sodash outbox contains messages from other people : Content-Type: text/plain; charset=ISO-8859-1; format=flowed : Content-Transfer-Encoding: 7bit : : : : I.e. my outbox shows messages possibly written by admin This is another test. | Miles Visions of the New Office | \r : \r : Miles' suggestion of WAAFs pushing markers across table-top displays. *"Incoming data analysis? Vector 2, 0, 1..."*\r : \r :
\r : \r : \r : The office on a bad day, as envisaged by Hieronymus Bosch.\r : \r :
\r : \r : \r : The office, as envisaged by Dali.\r : \r :
\r : \r : Winterwell catches the Google Wave | \r : \r : The Winterwell office has caught Google Wave. We've signed up on the beta-testing program to ride this new fangled would-be email-killer. \r : \r : If you've not heard of Google Wave, it's a cross between an email system, and a document editor for groups of people (e.g. like a wiki - or a Google Docs document). Plus it has some extra features. Notably it's open to other companies building on it, or even running their own compatible systems.\r : \r : Now what to *do* with it?\r : \r : We're experimenting with it as a collaborative working tool. It looks shiny. How will it compare in practice with our established setup of email + shared version-controlled files?\r : \r : We'll also be looking into the API to see what applications and widgets we can build with it. If you're interested in developing a Google Wave app or robot with us, please get in touch, Java: A new approach to Equals() | Custom equals() methods are handy but involve writing a lot of ugly boilerplate code - and there are booby-traps for the unwary. In this post, I present a way of over-riding equals() and hashCode() methods - based on annotations - which makes them a doddle to implement and maintain. The resulting code is available open-source on request in a shrink-wrapped jar.\r : \r : To recap, `a == b` tests whether `a` and `b` are the same object. `a.equals(b)` tests whether they are equivalent objects. `equals()` is a very useful method, and you will no doubt have found yourself over-riding the default version in several of your classes, e.g. for use with sets and maps.\r : \r : Over-riding `equals()` is not quite as straightforward as it seems. You must also override `hashcode()` in an equivalent manner. Otherwise HashSet, HashMap and HashTable will exhibit strange bugs. I've made this mistake in the past and can confirm it's confusing as hell to debug. Josh Bloch's Effective Java provides more examples of how equals can go wrong.\r : \r : Also `equals()` and `hashCode()` methods contain a lot of boilerplate. Here's an example of `equals()` for a class - and this is from a simple class with only 2 fields:\r : \r : \r : public boolean equals(Object obj) {\r : if (this == obj)\r : return true;\r : if (obj == null)\r : return false;\r : if (getClass() != obj.getClass())\r : return false;\r : final ToyStoryState other = (ToyStoryState) obj;\r : if (leftBank == null) {\r : if (other.leftBank != null)\r : return false;\r : } else if (!leftBank.equals(other.leftBank))\r : return false;\r : if (torch == null) {\r : if (other.torch != null)\r : return false;\r : } else if (!torch.equals(other.torch))\r : return false;\r : return true;\r : }\r : \r : \r : Yuck! Ugly, barely readable and a great nesting place for bugs. To be fair to Java, equality is an inherently tricky issue. Once Ruby developer scoffed at this, and presented a "nicer" Ruby example -- which was much shorter, but it was also buggy.\r : One reason why the code above is ugly is that it properly covers all the possibilities. Also, it was auto-generated by Eclipse rather than painfully written by hand.\r : \r : ## Annotations to the rescue!\r : \r : There is a nice solution to this issue using annotations. We'll start with an example, then look behind the scenes at how it works.\r : \r : ##Example\r : \r : \r : class MyObject {\r : \r : @Equals\r : public String name;\r : \r : private int id;\r : \r : @Equals\r : public int getId() {\r : return id;\r : }\r : \r : public boolean equals(Object obj) {\r : return Equality.equalsByAnnotation(MyObject.class, this, obj);\r : }\r : \r : public int hashCode() {\r : return Equality.hashCodeByAnnotation(MyObject.class, this);\r : }\r : \r : }\r : \r : \r : With the example above, two MyObjects are considered equal if they have the same `name` and `id`, as returned by `getId()`. Equals is our custom annotation, and Equality is a support class with static methods...\r : \r : ## Making it work\r : \r : The key to `equals()` is to know which properties are important. Properties can be fields or zero-argument methods, such as getters. This can be specified in a natural way by annotating fields and methods. First we define an annotation for properties that should be tested: `@Equals` indicates a property that should be compared using `equals()`. Defining an annotation is similar to defining an interface, except you use the keyword @interface, and you'll usually want to add some meta-annotations.\r : \r : \r : @Retention(RetentionPolicy.RUNTIME) // Meta-annotation for "Don't throw this away during compilation"\r : @Target( { ElementType.FIELD, ElementType.METHOD }) // Meta-annotation for "Only allowed on fields and methods"\r : public @interface Equals { }\r : \r : \r : Now we can annotate properties like in the example - how do we test them? We need to make use of the reflection api. Given a Class object, we can get it's Field and Method objects via `Class.getDeclaredFields/Methods()`. Given those, we test for the presence of annotations using `Field/Method.isAnnotationPresent()`. We use a little known Java feature to get access to non-public fields. I put the code for all this is in a class Equality under the static methods `equalsByAnnotation()` and `hashCodeByAnnotation()`.\r : \r : ## Checking the superclass\r : \r : Note that the `equalsByAnnotation()` method does not look at properties belonging to ancestor classes. Similarly, `hashcodeByAnnotation()` does not incorporate properties belonging to ancestor classes. If this is necessary, the user must call `super.equals()` and `super.hashcode()`. E.g. like this:\r : \r : \r : if ( ! super.equals(obj)) return false;\r : return equalsByAnnotation(MyObject.class, this, obj);\r : \r : and\r : \r : return 17 * super.hashcode() * hashcodeByAnnotation(this);\r : \r : \r : This is a deliberate design decision. If the parent class has over-ridden `equals()`, then it's method (which may or may not use annotations) must be checked. If the parent class hasn;t over-ridden `equals()`, then it's method should be ignored, or your back at object idenity. I've left this as the user's responsibility - partly so they keep control, partly because the code to check for an ancestor equals method would have been ugly. If you think it should be handled automatically, feel free to send me your code suggestions...\r : \r : ## Advantages\r : \r : Shorter code, cleaner code, God kills less kittens, etc. Having `@Equals` attached to fields and methods makes it clear what is and isn't being tested. This helps in maintaining correct behaviour when the class is edited. And it isn't restrictive - you can still write your own custom code if you need to.\r : \r : ## The Disadvantage: Speed\r : \r : So the code is a lot nicer, but naturally you lose a bit of speed for using runtime lookups on fields. I did some time trials, and the annotations based method came out as 3x slower. That's fine during development and if `equals()` / `hashCode()` are not bottlenecks. Note that `equals()` and `hashCode()` can be bottlenecks, e.g. when making intensive use of Maps. So you may not want to use this in some production systems.\r : \r : It has been suggested that I use bytecode editing (e.g. using ASM) to give a fast system. A very good idea, but sadly I'll have to leave that as an exercise for the reader for now.\r : \r : *The code for this article is licensed as open source code under LGPL, javadocs, and examples. Please let me know if you find this code useful.*\r : \r : \r : \r : From Samurai to Jedi | \r : \r : I recently caught a rare showing of Akira Kurosawa's film Hidden Fortress. The film follows a pair of bickering peasants - fugitives from the losing side in a war - who get caught up in rescuing a feisty princess. Akira Kurosawa is a legendary Japanese director, perhaps better known in the west for his influence on our directors. George Lucas was one of his fans, and helped fund one of Kurosawa's later films. He also used Hidden Fortress as the model for Star Wars.\r : \r : \r : \r : The peasants become robots, and the samurai become Jedi. The princess remains a princess. As well as the idea of telling the story from the points of view of the film's lowliest characters (Tahei and Matakishi / R2D2 and C3P0), Lucas also copied some of the aesthetics and a couple of scenes. For someone raised on Star Wars, it is somehow uncanny to see elements of it appearing in an old black-and-white movie - in some cases, copied right down to the scene-wipes and sound-effects.\r : \r : "Mediocre artists borrow", said Picasso, "Great artists steal." Star Wars is in no way a remake of Hidden Fortress. The plots are radically different. George Lucas also stole from Joseph Campbell's Hero's Journey theory. It is Campbell's theory that gives us the hero, the mentor, and the "dark father" who the hero must face. From the dark father stemmed the character of Darth Vader, one of film's most memorable villains. Though Vader's costume was inspired by Samurai armour, his role is from Joseph Campbell - Darth Vader is literally a corruption of dark father via Dutch.\r : \r : It is perhaps the use of Campbell's mythic structure which makes Star Wars so epic. Although not that alone - plenty of films since have copied the structure without the greatness. Excellent acting, script and special effects also play their part.\r : \r : Hidden Fortress feels a little flat in comparison with it's stellar offspring. It's still a great film though, with hints of deeper themes. Morally grey areas and darker notes show through Kurosawa's black and white cinematography. Even though this is an action/adventure flick, he shows "people as they really are." (to quote Princess Yuki), "their beauty and their ugliness".\r : \r : Bluetooth tracking: is there a danger to civil liberty? | \r : \r : Last week the newspapers revealed that the residents of Bath were being surreptitiously tracked via Bluetooth and their mobile phones. The scheme's operators have defended themselves by saying that Bluetooth data is anonymous: it isn't linked with you in any reliable way. So is there a problem?\r : \r : Yes - but not a terribly serious one. Although there is no fool-proof way to link Bluetooth data with an individual, there are plenty of cases where that link can be made:\r : \r : * If you set your phone's name to your own name. Admittedly, such people probably know they're broadcasting this data and are happy to do so.\r : * By statistically linking the phone's movements with that of a person. If they matched lists of employees against home addresses... there is a reasonable chance they can identify individuals.\r : \r : Simon Davies, of human rights watchdog Privacy International, said: 'This could become the CCTV of the mobile industry... It would not take much to make this a surveillance infrastructure over which we have no control.'\r : \r : This is an over-reaction. One good thing about Bluetooth is that you do have control of it. You can switch it off, or switch it to be private. In fact unless you use a Bluetooth headset, you should switch it off: Bluetooth drains the battery power from your phone.\r : \r : Now the thing is that you are already being tracked. If you live in a city, your mobile phone can be tracked to ~500m by looking at which transmitters it talks to. This data is kept by the phone companies and occasionally used by the government, e.g. in police investigations.\r : \r : ## F.A.Q.\r : \r : > "What can they find out about me?"\r : \r : The Bath system can only detect your position. It cannot detect anything about you if your Bluetooth is switched off.\r : \r : The system uses a set of detectors. When you pass by a detector, it notes the time and your Bluetooth ID. This gives a list of "sightings" which can be used to plot a trail of your movements. Bluetooth detectors have a range of about 10m, cost ~£200 and require a power supply to run. They can't be scattered just anywhere, and the system can only find you if you're nearby to one.\r : \r : > "Do I have Bluetooth on my phone?"\r : \r : Probably - most mobile phones do, especially if you got it within the last few years.\r : \r : > "Is my Bluetooth on?"\r : \r : Bluetooth is often on by default. The phones I'm familiar with have a blue light which flashes periodically to indicate when Bluetooth is on. To be sure, find the Bluetooth settings in your phone's menu.\r : tags: group:frontpage bluetooth tracking civil-liberties profiling Midi and MP3 generation in Java for online playback | \r : \r : As part of the Robo-Bach project, I got into making midi and mp3 files from Java. This post describes how I did this without too much pain.\r : \r : ## Making MIDI\r : \r : Java provides classes for working with Midi. These are fairly unpleasant. I used JFugue instead. Here is the code to play s simple tune:\r : \r : \r : Player player = new Player();\r : player.play("C D E F G A B");\r : \r : \r : Nice, no? And the only slightly more complicated code to make a two-voiced tune and save it as a midi file:\r : \r : \r : Player player = new Player(false);\r : String tune = "V0 C D E C V1 R R C D E C"; // The opening to Frere Jacques\r : player.saveMidi(tune, new File("mytune.mid"));\r : \r : \r : ## Delivering music online (with Javascript controls)\r : \r : Now suppose you want to play your midi file as part of a webpage. This is not as easy as you might expect.\r : \r : You can direct the user's browser to the midi file - which might trigger it to play, or might not, and certainly won;t let you control playback.\r : \r : You could try an applet. But (a) the Java applet is dead. Attitudes towards this range from mild indifference to complete disinterest. And (b) Java applets don't support midi.\r : \r : So we need to use Flash. Flash also doesn't support midi, but it does support mp3. I embedded an invisible Flash mp3 player using the SoundManager javascript library.\r : \r : ## Converting MIDI to MP3\r : \r : To do this I relied on a call to a Unix process. I used timidity piped through lame, called from Java via Process and a shellscript.\r : \r : The shell command to convert midi to mp3 is:\r : \r : timidity -Ow -o - mytune.mid | lame - mytune.mp3\r : Innovation vs procurement: inevitable enemies? | \r : innovation requires risk\r : good procurement strives to avoid (86 rows)