A recent file rearrangement on my server resulted in all the links for “Snake Wrangling for Kids” breaking.
The problem is fixed now, and the zip files should be accessible again.

by Jason R Briggs
You are currently browsing the archive for the python category.
A recent file rearrangement on my server resulted in all the links for “Snake Wrangling for Kids” breaking.
The problem is fixed now, and the zip files should be accessible again.
In a moment of either inspiration or madness (I’m pinning my hopes on the fact that there’s a thin line between madness and genius), I decided to start work on a version of SWFK for the OLPC.
There are a couple of reasons why this is more of a challenge than the other versions of the book:
The first point is easy to solve. I’ve been running OLPC on VirtualBox (occasionally) for a few months now, so at least I have a vague idea of how to interact with the Sugar interface.
The second point is slightly more problematic. As far as I’m aware, the OLPC runs GTK, and so the pygtk module is available — but I have yet to come across a turtle implementation for gtk (ignoring, for the moment, the TurtleArt activity already installed on the OLPC, which I don’t think is particularly useful for my purposes).
It seems like quite a fun project — implementing the turtle module using pygtk — until you consider the arcane pygtk API, the (IMHO) lack of reasonable documentation (I’m not particularly impressed by the pygtk reference manual… particularly the lack of index), and simple examples to expand from.
Technical issues aside, a week later, I’m considering the semantics of an OLPC edition of the book. SWFK for Windows, Mac and Linux, all target the same fundamental audience. An OLPC edition has a completely different potential audience. The Western+English market is, no doubt, vanishingly small — so while, to date, I’ve had over 6500 downloads, I’m guessing an OLPC edition might garner less than 1-3% more. The real market would be translated versions (assuming the interest in translating actually results in translations) — but that begs another question: will kids in non-Western countries actually understand some of the references? Is talk of DVD players, in-car computers, Nintendos, etc (i.e. some of the references in chapter one, for example), at all meaningful in the developing world? I’m doubtful.
Which leads me to posit the question, is it worth the not-so-insignificant effort?
What do you think?
Genius or madness?
Long awaited by… well a couple of people at least… I’ve recently been working on splitting out Snake Wrangling for Kids into 3 separate editions: one for Windows, one for Mac and one for Linux.
This proved rather more challenging than expected (rather characteristic of LaTeX as a whole actually), and I haven’t fully proofed the final result yet. Those interested can check out the new editions here:
SWFK - Linux Edition
SWFK - Mac Edition
SWFK - Windows Edition
The Mercurial repository (here) containing the LaTeX source has also been updated with the latest changes.
I’ve had a few requests to release the “source” to Snake Wrangling for Kids, by people interested in translating the text into another language.
SWFK is still a work in progress — although that progress has been rather slow since we moved to the UK — but I can’t see any reason why I should put off releasing the LaTex source until some sort of mythical completion date, particularly not when there are willing participants out there.
So for those who are interested in translating SWFK to another language, the latex “source” can be found in the following Mercurial repository:
http://www.briggs.net.nz/hg/swfk
Note that it isn’t “buildable” in its current state. I haven’t added the image files yet because some of them are rather large (the cover alone is over 2MB) — the wonders of the EPS format. Mercurial doesn’t seem to handle excessively large image files that well (at least not on my web host it doesn’t), so if anyone has ideas on that front, let me know.
I’ve just fixed (rather hurriedly) an annoying bug in domyinvoice.com. Invoices generated for a project with a daily rate (rather than hourly) were calculating the days incorrectly — basing the calculation on the number of tasks rather than the distinct list of days. Easy to fix, but since I haven’t been doing any Python for the last couple of months, it took a few hours longer than it should have.
Do My Invoice, my web-based, RESTful invoicing solution, has been live for a couple of months now. Admittedly there hasn’t yet been a large number of registrations, probably because I haven’t really promoted it other than through this low-traffic blog, and because I hope that word-of-mouth may eventually kick in and result in a few more paying customers.
Interestingly though, I’ve somehow made the second page of Google results (currently sitting at position 16) for the keywords “web based invoicing”.
However, for the term “web based invoice”, the site is either so far down the list it can’t be found, or it doesn’t appear at all.
Also interesting: 3 weeks ago, DMI was at position 83 for the search term “invoices”, yet I can’t find it anywhere in the first 100 pages now — obviously we hit a broken rung in the Google ladder and fell down a few hundred feet.
Completely off-topic, but one thing this has highlighted as a gap in Google’s toolset is “Search Within”. It would be really useful to enter a few search terms, then find exactly where another term sat in those results. In other words being able to enter something like: “domyinvoice within: web based invoicing“, which gave you the results found within a set of results, along with the index positions. Seems like a killer Google feature for web developers, marketers and the like.
I’ve just uploaded the latest version of SWFK. This fixes a few issues with continuity (such as referring to functions before they were actually explained), adds a basic explanation about the use of brackets and order of operations (which I think was a serious omission given the context), and other minor grammatical and code fixes.
The only major change is to add a few exercises to the end of some of the chapters, plus a new appendix for the answers. This is a work in progress. Exercises are currently missing from Chapter 9, and possibly aren’t as detailed (nor fun) as they should be.
But they’re a start…
The latest version of “Snake Wrangling for Kids” has been uploaded, and is now available on the main page. This is the LaTeX conversion I mentioned in a previous post — but I’ve now applied a Python code checker to 99% of the example source code. Hence I’ve fixed up a few omissions, bugs, and so on.
Automating the testing of example code proved more challenging than expected. In the end, I failed miserably to get doctest working they way I wanted, and had to roll my own testing code which works for a majority of the examples. It’s not perfect, but picked up a few problems, so I’m reasonably happy with the end result… even if it is a complete hackjob.
I started learning C programming in, or around, 1991. Up until that point, I had been exposed to Basic, Pascal, Modula-2, and (I think) Prolog. C was… somewhat of an eye-opener. Anyone who has had to come to grips with pointers and memory management in C, will know what I mean (and thank goodness for the Borland C IDE help system - unparalleled as a learning tool).
At University, we had papers that required learning a functional language and assembler, but neither of these were like the screaming brick wall that I hit in my second year when I decided to learn C.
Actually, since then, I can’t think of a single technology where I’ve experienced a similarly steep learning curve.
Until LaTeX.
I knew what LaTeX was, of course. But neither had the need, nor in fact, the interest, in learning it before. But after deciding I needed to move SWFK out of the atrocious (and, from an automation perspective, useless) word processing format it’s in at the moment; and after a few aborted attempts to get some reasonable output from various docbook toolsets, I’ve been provided with just the incentive.
But the learning curve is excrutiating.
Get one thing working, and the next thing stops. Change this and effect that. It was all me, of course. There’s nothing wrong with LaTeX after you get used to how it works. But, oh that learning curve. I haven’t felt such a need to scream and yell at my computer in a decade.
This is not to say things are perfect now. I still can’t get the front cover to look centered (advice from a LaTeX guru would be greatly appreciated) and, at times, LaTeX seems to think it knows, better than I do, where to put figures. Working out how to put visible spaces in a \verbatim block is also proving a little challenging. But the overall result does look, in my opinion, much nicer.
Anyone interested in proofreading the new version, please let me know.
Some time ago I decided I needed a text editor with minimal distractions. I can’t stand the sheer volume of buttons on the average word processor. Even AbiWord, which is skeletal in comparison with OpenOffice or Word, is still too busy at times (I miss Q&A Write actually — the Word Processor I used to use back in the days of MS-DOS).
A google search at the time turned up WriteRoom, which is Mac-only. But not much else.
So, of course, I developed my own alternative.
It didn’t quite fulfill the vision, so I dumped it in the “deprecated projects” list after it languished, unused. But I’ve been coming back to the idea lately, and finally did a bit more research on the various buttons you can push in PyGTK.
So, the Vanilla text editor is now -slightly- more supported than it was before (i.e. I’m probably going to use it myself), and is a bit closer to the vision. It’s completely configurable (background and foreground colours, font, key map, etc are all set from an ini file), and now runs full screen (as it should’ve from the beginning).
Only the source is available at the moment (Python2.5 and PyGTK2.0 required), but I’ll eventually make a commercially supported package available if there’s any interest.
While I have pretty much all my code in source control of some kind (originally subversion, now mainly mercurial), the only repos that’s externally accessible is YAK for WordPress (currently a subversion repos hosted at Sourceforge). For something like my quick (read hack-job) attempt at the Stomp protocol, that’s somewhat less than optimal, since I do occasionally (i.e. like today) get updates — so either I have to upload multiple versions, or just overwrite the current version. Which isn’t much use if someone wants to track back to previous revisions.
So, after coming across Bill de HÓra’s “Mercurial First Impressions” and subsequent “Setting up Mercurial on TextDrive” a few weeks ago, it seemed a good time to do something about the situation.
Installation went reasonably well, apart from the obligatory, and rather obscure, Apache error: “Premature end of script headers” — where it took me an embarrassingly long amount of time to figure out that the hg directory needed to be chmodded to 755 as well as the hgwebdir.cgi file (despite the fact I’d checked the file and directory permissions a couple of times during the process… sigh).
The URL for the stomp.py Mercurial repository is now:
http://www.briggs.net.nz/hg/stomp.py/
To checkout the latest copy you can use the following command:
hg clone http://www.briggs.net.nz/hg/stomp.py stomppy
After which hg update will keep you in sync with the (all too infrequent) changes.
Now have to decide whether I want to move YAK off sourceforge and into its own Mercurial repository. The control-freak, tech-geek part of me, gleefully says yes. The lazy, I’ve-got-too-much-other-stuff-to-do part says “pass”.
Time will tell… hopefully before I start arguing with myself.
I’ve been playing with Mercurial lately, after having previously tried Git out for a few weeks.
I must say, I prefer Mercurial, though I would be hard-pressed to quantify that into some kind of this-feature-is-better-than-that-feature list (apart from the most obvious advantage that it’s cross platform, of course). For some reason, it just feels right. Which is weird, I know.
This is not to say Mercurial is perfect. Because there are a few niggles. The one that jumps immediately to mind is the fact that no matter where you are in a repository tree, hg status presents the list of changed files from the repository root — not the current context directory. So if you then copy and paste a file name (with path) from that status list, to commit a single change, you get an error. Because subdir/filename does not exist when you’re already in subdir. Small niggle, but there it is.
Second problem I’ve had recently: hg rm file, where file is the last in the directory. Mercurial appears to remove the directory. If you then try to commit, it throws its hands up in horror. Again, only a minor annoyance.
Apart from that, I like Mercurial enough to migrate all my other personal projects off subversion. I’d also be really interested to see how it handled a really large enterprise project, since subversion (at work) is groaning under the strain of handling that project. Chances of that happening are minimal, and would require replicating almost 2 years worth of commits, to really test it, but it would be interesting to see the results…..
The worst thing about Python is that when you spend a lot of time working with other languages (in my case, Java), and you come back to work on a Python app, occasionally you forget that there’s usually a better way to do something. Case in point: creating a rather simplistic framework for initialising an application (db connections, messaging, etc), only to realise a few days later that none of your dynamic loading of initialisation scripts is necessary in the slightest, when a module’s __init__ provides as much of that capability as might be needed.
Sigh.
On a completely unrelated note, a special thanks to the various people who’ve recently linked to this site (specifically to the YAK page). I was never that bothered by an abysmally low technorati ranking, but now find myself absurdly pleased when I’ve climbed up by about half-a-million.
Well… I’m still in the 500,000s anyway, it’s an improvement. A-list? Can you say Z-list? ![]()
One of my earliest posts (back when I was using blogger.com), was regarding a reverse logging proxy — a simple Python script to transparently log requests between a client and a server. Very useful for debugging web service calls, to see what exactly what is being sent back and forth.
I’ve just added logproxy.py to my projects page, after updating it to handle virtual hosts (in other words, when using Apache to serve more than one domain).
This is not quite as impressive a change as it might sound. It basically means correctly setting the Host HTTP header, rather than defaulting to localhost…
Anyway, a few people appear to have found it useful, judging from my access logs, so this update might help someone else (it certainly helped me figure out why my wsgi app was failing to properly handle multipart form content).
I’ve documented my quest for templating nirvana before (here, here, here and here… oh and also here), and thought I’d come to a reasonably satisfactory conclusion in some of my latter experiments.
My premise was that none of the templating engines currently available (you name the language/platform), come close to what I feel should be the fundamental goal of xml/xhtml templating — complete separation of markup from the code that populates it with dynamic data. There are a few templating systems that come close (Tapestry’s templating in the Java world and HTMLTemplate in Python, for example), but none that pushed all the buttons I thought needed to be pushed.
So, to this lofty ideal, I offered the idea of using a simple ID on elements and then putting all complexity in code to handle repeating of child elements, setting of attributes, setting node values, hiding elements, etc.
All in all it seemed to work for the (admittedly limited) cases I tried, but that’s the problem with grandiose schemes that are mainly based in the theoretical. Come time to apply the ideas in a practical sense, and you suddenly discover you haven’t thought it through well enough. Or to be more precise, I suddenly discovered I hadn’t thought it through well enough.
Case in point: I believe that in some situations it should be easy to use a different template without having to make modifications to the generating code. Which is where my simple IDs fall over in a quivering heap. Here’s an example (contrived) which immediately breaks it:
Example 1:
<users>
<user name="Joe Bloggs">
<address>10 Test St</address>
<city>somewheresville</city>
</user>
</user>
Example 2:
<html>
<body>
<ul>
<li>Joe Bloggs
<ul>
<li>10 Test St</li>
<li>somewheresville</li>
</ul>
</li>
</ul>
</body>
</html>
The xml in example 1 uses an attribute for the user’s name. The html in example 2 uses a textual element for the name, followed by another unorder list for the address elements. I could make the examples more complicated to show even more disparities, but you hopefully get the idea: if we’re just using an ID to distinguish elements, how do you say “set the attribute name to this value” for example 1 and “set the element value to this” for example 2, without having massively complicated code that either allows for both cases (and uses some kind of logic to figure out which should be applied), or has different code for each?
It’s naff. To put it politely. Not to mention that littering a template with unnecessary IDs is just as ugly as littering it with code.
So, in no way have I managed to reach templating nirvana. I have, however, demonstrated that if you’re not using this stuff in anger, there’s no way you’re going to come up with a good solution. Certainly some of the templating systems I’ve used (JSF springing immediately to mind) tend to suggest that whomever came up with the concepts didn’t follow through and use them either. So at least I’m in good company.
Enter the latest vain attempt.
My current thinking is that the template needs 3 ID attributes, one for repeating an element (rid), one for setting attributes (aid), and one for setting node values (eid). This gives a reasonable amount of flexibility and allows for the same code to populate different templates.
This code is very much a work in progress… and very much a completely inelegant mess. But it does satisfy a couple of requirements: 1) I am actually using it, and 2) the same code can be used to populate a variety of templates without requiring code changes.
It uses 4suite’s xpath to set values, and the use of xpath no doubt means performance will be adversely affected, but it’s a start.
In terms of the above examples, you might use this engine as follows:
<users rid="user-list">
<user aid="user-name" name="">
<address eid="user-address"></address>
<city eid="user-city"></city>
</user>
</user>
<html>
<body>
<ul>
<li rid="user-list" eid="user-name">
<ul>
<li eid="user-address"></li>
<li eid="user-city"></li>
</ul>
</li>
</ul>
</body>
</html>
tmp = Templates()['users.xml']
tmp.repeat('user-list', 2)
tmp.setelement('user-name', 'Joe Bloggs', 1)
tmp.setattribute('user-name', 'name', 'Joe Bloggs', 1)
tmp.setelement('user-address', '10 Test St', 1)
...etc
In the case of the first template, the attribute name is set with the value “Joe Bloggs”, and for the second template, the element value is used. The idea is, don’t use the relevant ID (rid, eid, aid) if you don’t need to apply that value.
More to come as I iron out the wrinkles…
Note: the template engine uses the Borg pattern.
I’ve just discovered that my implementation of a Digest Authentication middleware for WSGI, which had been working perfectly with Firefox, fails miserably when I try it with IE.
A bit of googling finds the following…
http://www.eweek.com/article2/0%2C1895%2C1500432%2C00.asp
http://www.extremetech.com/article2/0,1697,20373,00.asp
…making me think that I’m basically stuffed, despite the fact that I followed the RFC. Excellent work Microsoft! (Sarcasm alert!)
Should’ve done some reading first I think.
So now a decision. Persevere with digest auth and upgrade (irreversibly?) to IE7 in the vain hope that it works, or roll back to Basic auth which will obviously support more browsers.
Much as I’d like to stick with digest (and see if my implementation works in IE7), I think Basic auth is the safer option (at least coupled with SSL), particularly when you look at browser market share.
UPDATE: Of course, the easier alternative, is just to upgrade to Kubuntu Feisty…
More for my own reference than anything else….
To get mod_python working with python2.5 on kubuntu:
1. Install apache2, if you haven’t already.
2. Install python2.5:
sudo apt-get install python2.5
sudo apt-get install python2.5-dev
Don’t get rid of python2.4, since it’s still used by a number of things.
3. Change the symlink for /usr/bin/python to point at the new version:
sudo rm /usr/bin/python
sudo ln -s /usr/bin/python2.5 /usr/bin/python
4. Install apache apxs2:
sudo apt-get install apache2-threaded-dev
5. Download and extract the latest dist of mod_python. cd to that directory, configure and build:
./configure
make
6. Possibly a controversial step: install modpython as normal (apt-get), then replace the shared-objects with the .so’s you’ve just created:
sudo apt-get install libapache2-mod-python
From the downloaded modpython directory (i.e. where you ran configure & make), copy the shared-object files:
sudo cp src/mod_python.so /usr/lib/apache2/modules/
sudo cp dist/build/lib.linux-x86_64-2.5/mod_python/_psp.so /usr/lib/python2.4/site-packages/mod_python/
Copy the mod-python directory from python2.4 site-packages to 2.5:
sudo cp -R /usr/lib/python2.4/site-packages/mod_python/ /usr/lib/python2.5/site-packages/
Restart apache and use modpython as normal…
My latest article, Getting Started with WSGI, (one of the numerous reasons I’ve been too busy to post lately) has just been published in O’Reilly’s ONLamp.
Actually, it’s not the main reason for not posting, since I wrote the piece back in July, but I’m still using it as an excuse until the next couple of projects come to fruition… ![]()
After a bit of googling, I have yet to find a command-line tool that simply handles HTTP requests. There’s wget for retrieving files, curl for PUT and POST, but what to use for DELETE (or other HTTP methods)? For that matter, I’d prefer to have a single tool, rather than 2 or 3.
I wrote a simple python script a while back — the last time I couldn’t find such a tool — and so now that I need to use it again, it seems like a good time to polish it up and release it (here). Not before chopping out a bunch of specific hacks that really need tidying up before being included in released code, of course.
So if you need a basic tool for handling HTTP requests (GET, PUT, POST, DELETE) you may find it useful… rather than having to write the few lines of code yourself.
Just noticed on Jon Udell’s feed here, a story about WriteRoom — an editor which removes all distractions from the desktop to allow you to focus on the task at hand.
I think it’s a brilliant idea — weird thing is, I had the same idea independently, a year or so ago; having decided that there were way too many distractions in the average editor/word processor and that I’d get a lot more done if they were just gone.
As a consequence, I’ve been using my own ‘distraction-free’ editor (written in Python) for the last 6 months, always meaning to release the code, but never quite getting around to finishing it. Not sure if the idea actually works, but I do prefer writing in that environment rather than any of the alternatives, so I guess it does.
Anyway, in honour of WriteRoom, which looks a lot more polished than my completely unfinished effort, I thought I might as well release my own version (ued-0.5.zip, ued-0.5.tar.bz2) — making no warranties for its suitability for any task, stability, or anything else (i.e. use at your own risk… I am).
You’ll need python2.4 and pygtk2.0 or later to run it.
Usage:
One final note, it has its own basic versioning built-in - each time you save a file a backup is made (a DOT [.] file) in the same directory.
Every time I sit down to do something with mod_python, I have problems. Usually they can be traced to forgetting to RTFM.
It doesn’t help that I tend to put experimental code in my apache www dir on occasion — and forget to back it up — so that after a system reinstall, I then have to dredge the dim, dark recesses of my memory to remember how to do simple stuff.
So for my own future reference…
<Directory /var/www/test>
PythonHandler wsgi_handler
PythonOption WSGI.Application hello::simple_app
AddHandler python-program .py
</Directory>
Where:
In the case of my kubuntu laptop, I just created a mod_python.conf file containing the above in /etc/apache2/mods-enabled
def simple_app(environ, start_response):
status = '200 OK'
response_headers = [('Content-type','text/plain')]
start_response(status, response_headers)
return ['Hello world!\n']
I’ve also added an extension to make wsgi_handler slightly more useful for my purposes; the ability to serve more than one ‘app’ from the same script. The extended version can be found here.
The apache configuration can then be modified to:
<Directory /var/www/test>
PythonHandler wsgi_handler
PythonOption WSGI.Application hello
AddHandler python-program .py
</Directory>
And hello.py now looks like this:
def simple_app(environ, start_response):
status = '200 OK'
print str(environ)
response_headers = [('Content-type','text/plain')]
start_response(status, response_headers)
return ['Hello WSGI world!\n']
def test(environ, start_response):
status = '200 OK'
response_headers = [('Content-Type','text/plain')]
start_response(status, response_headers)
return ['test test test\n']
Meaning that you can now point your browser to:
http://localhost/test/test.py/simple_app
or:
http://localhost/test/test.py/test
Which is useful for experimentation.
My experiment with web templating is finally complete. You can see the resulting output of running this script against these templates (1, 2, 3, 4 and 5).
Yes, the code to inject content into the template is rather ugly, but that’s a byproduct of the fact that I’m hard coding these values rather than delivering them from a data model of some kind (and I’ve made no attempts to tidy up that code anyway).
The example below is particularly unattractive:
template['footer:col'] = template2.repeater(3)
template['footer:col:1:footer:heading'] = 'test1'
template['footer:col:1:footer:text'] = 'blah1 blah blah blah blah'
template['footer:col:2:footer:heading'] = 'test2'
template['footer:col:2:footer:text'] = 'blah2 blah blah blah blah'
template['footer:col:3:footer:heading'] = 'test3'
template['footer:col:3:footer:text'] = 'blah3 blah blah blah blah'
When a template fragment is imported into another, the IDs of elements in that fragment need to be prefixed (with the ID of the containing element) in order for all IDs in the resulting DOM to remain unique. When elements are repeated, they need to have a counter appended to their IDs, again to ensure uniqueness.
In this case, I’ve imported a fragment into a div with the ID “footer”, meaning that the elements inside are prefixed with “footer:” — then I use a repeater on a column element in the import. The algorithm which re-works the IDs isn’t particularly intelligent, hence the rather hideous 'footer:col:[x]:footer:[actual-id]' structures.
I’ve done a little more work on ‘my’ approach to templating. One thing I’ve only mentioned in passing, is that there are other templating engines out there that work in this fashion — both in Python and Java. HTMLTemplate, is a python module I’ve already mentioned previously. Tapestry is a Java framework, which I haven’t personally used, but seem to recall uses an attribute language for its templates. And there are others, so what I’m experimenting with is hardly anything new. But I have yet to see one that I think goes far enough with the removal of markup from view components. Not to harp on about it too much (not half).
I finally took an hour to revisit the templating code mentioned back here, using minidom instead of rparsexml. As expected, refactoring produces better code, but ‘better’ is a relative term — it’s still a major hack-job. However, in this case, I’m not bothered as long as it proves a point with as little effort as possible.
So, to recap my prior thoughts:
An electric charge applied to a few single-celled organisms contained in a small bucket of slime could design something better.
My intention is, therefore, to hack together a quick prototype — hopefully coming up with something a little more elegant in the process. And since (I hope) I have a few brain cells to rub together, I might just provide reasonable competition to the bucket of slime.
One of the problems developing with more than one language, is that the boundaries between them become a bit hazy at times. Case in point, this morning Tim Bray was talking about multicasting in Java, and wondered about the availability of similarly useful libraries in other languages (including Python). My immediate thought was (overconfidently), “Of course, it’s just as easy in Python…”
….only to realise that I was thinking about multicast in Java myself (and the last time I used that part of the API was at least 2 or 3 years ago). Now that I think about it, I don’t think I’ve ever tried multicast in Python. Nor standard datagrams. Haven’t had the need as yet.
But surely it’s easy right?
An hour or so later, and I have to admit, the Python documentation around the socket and SocketServer modules is really a little light on useful examples when you get into it. Setting up a simple server and client to talk to directly to each other (SOCK_STREAM) is easy. Figuring out what changes are necessary to talk via SOCK_DGRAM, isn’t immediately obvious from the module docs — thanks effbot for a working example.
After you get over that hurdle, you kind-of expect changing to multicast to be perhaps relatively straightforward. But that proves to be just as difficult.
The Python wiki saves the day. There are a couple of multicast examples there, but I could only get one of them to work, which I’ve adapted slightly below.
Read the rest of this entry »
One good reason for improving python’s visibility is you might hope that more visibility equates to more jobs. Which had me wondering about the current python job situation.
The answer to that question (in NZ at least) is easy: zero to minimal.
More interesting is the situation as reported by Jobserve — my favourite site when looking for work in the UK.
Figures in Jobserve have to be taken with a grain of salt. At any particular time, there will be a certain (fairly large) percentage of adverts used for CV fishing. There’s a percentage (also large) of double-listings, where potential employers list roles with multiple agencies. There are a number of other practices (some dodgy) which reduce the number of valid adverts even further. That said, the final figure is still a reasonable indication of market interest, even if not particularly scientific.
The figures are only of interest when compared to other platforms/languages, so here’s a basic rundown of contract listings as of now, for the last 7 days…
<table border="1">
<tbody><tr>
<th>Language</th>
<th>No. of Listings</th>
<th>% of total</th>
</tr>
<tr>
<td>C/C++</td>
<td>761</td>
<td>27.42</td>
</tr>
<tr>
<td>.Net/C#</td>
<td>750</td>
<td>27.03</td>
</tr>
<tr>
<td>Java</td>
<td>975</td>
<td>35.14</td>
</tr>
<tr>
<td>Perl</td>
<td>197</td>
<td>7.1</td>
</tr>
<tr>
<td>PHP</td>
<td>67</td>
<td>2.41</td>
</tr>
<tr>
<td>Python</td>
<td>21</td>
<td>0.76</td>
</tr>
<tr>
<td>Ruby</td>
<td>4</td>
<td>0.14</td>
</tr>
</tbody></table>
I haven’t exactly been rigorous in my search criteria (checking for exclusivity or anything like that), but the results are still quite interesting — and a little different than I might have expected:
In any case, Python job listings are no higher than they were 6 months ago (when I last looked) — which doesn’t necessarily mean anything as I don’t have figures to compare for the other platforms. It will be interesting to compare numbers in another 3 months or so (although perhaps the numbers would be more useful if I made the effort to write a script to sanitize the data more thoroughly and do a proper comparison)…
Marketing Python sounds like a brilliant idea, but I believe any marketing here in NZ would be, very much, a rearguard action. The phrase “two horse race” comes immediately to mind.
Not to say there’s nothing else happening in the market, but from a development perspective, if you look at the work advertised on the web, it’s hard to find anything other than Java and .Net (although I heard of a reasonably-sized Perl project a while back).
It would be nice to see Python getting some visibility.
A quick install of the Eclipse-SDK proved more painful than expected, purely because my system-vm was set to Java1.5, which is problematic (at the moment) on gentoo. java-config (the tool used to set the vm for the system, or for a user) appears to be semi-borked on my machine (or I don’t know how to use it properly), because it kept switching back to 1.5 despite the fact that I set it explicity to 1.4. So after removing 1.5, and recompiling (re-emerging) a mass of dependencies, eclipse and the pydev plugin finally installed.
First impressions, after not having looked a the Eclipse 3 platform for a while:
1. NetBeans panelling is -way- more slick than Eclipse. Eclipse seems clunky by comparison — perhaps there is a way to set a “FastView” panel to open on mouse-over like NetBeans, but it wasn’t immediately obvious…?
2. Also because the panelling isn’t as good, the Eclipse screen layout doesn’t work as well unless you have a large screen, I think. I’m running a small widescreen on a laptop (1280 x 800), and the editor interior seems vertically challenged unless you minimise the bottom panel — which wouldn’t be an issue, if it wasn’t for the fact that the UI is designed by touchpad bigots (a majority of window managers are able to put the close, minimise and maximise buttons in a group on a window title-bar, so why does Eclipse have to be different?), so it seems as if more touchpad use is required to perform the same action on Eclipse when compared with NetBeans.
3. The method to add new file filters isn’t obvious to me either
4. Fonts are a lot (in large flashing letters on a billboard somewhere) nicer on Eclipse than on NetBeans. NetBeans is hard to look at after Eclipse. JEdit is generally ghastly as well from the font perspective — both suffering from Java’s traditionally wonderful font management (that’s sarcasm in case it’s not apparent).
5. It seems that the blame for the chunky behaviour of WSAD/RAD (+Linux) should fall squarely on the heads of the IBM teams in charge of corrupting (I mean) enhancing Eclipse, because alone, it’s a lot more responsive, and memory usage is considerably better than NetBeans. Doing nothing, NetBeans sits at about 17-19% memory usage. Eclipse is currently (doing nothing) sitting on 9.7%.
6. I still don’t like the way that most (all?) IDEs force you to work to their project formats — i.e. create a web project, create a simple java project, etc. A simple text editor with plugins you can twist around, to your own liking, seems much more attractive — but I’m probably in the minority.
Are you using web.py? Let us know!
Us? As in the Royal “We”?
Anyway, webpy looks interesting, and its lightweight nature fits my current thinking around web development — but I still think the database stuff should have been split out into a separate module…
The problem:
I want to create a window with effectively 2 panels, separated by a horizontal line. Panel 1 takes up (approximately) 90% of the height. Panel 2 takes up the remaining 10%. Let’s just ignore the height of the line for the moment.
For example, something like:
|-------------------------|
| |
| |
| |
|-------------------------|
| |
|-------------------------|
However with my bounding gtk.VBox, I was ending up with 3 boxes of equal height, containing panel 1, the horizontal line and panel 2.
What I somehow kept missing in the documentation were the expand and fill parameters. For example:
vbox.pack_start(panel1)
vbox.pack_start(hline, expand=False, fill=False)
vbox.pack_start(panel2, expand=False, fill=False)
vbox.set_size_request(-1, 30)
The first pack_start call uses the default expand and fill (True). If you don’t set expand and fill to false on the calls for hline and panel2, then you end up with my initial problem (3 boxes of equal height).
Note that for the parameters to set_size_request, -1 is a request to automatically size the width to the bounding container, and 30 is the pixel height.
Of course, now I’ve decided I don’t need this layout anyway…
Reading RFC’s is an arcane art. Enough so, that I’m not entirely sure they shouldn’t be read from a dusty, moth-eaten tome, atop a twisted wooden pedestal, by the light of a sulphur-smelling candle protruding from the top of an ancient human skull.
RFC2069 is a good example. If the definition for the response field in the Authorization header was recipe for a sponge, I’d probably end up with rock cakes.
But I think I’ve cooked sponge….
Assuming the following:
import md5
import binascii
def hex(s):
return binascii.hexlify(md5.md5(s).digest())
Then the response field would be generated from:
hex(hex('<username>:<realm>:<password>') + ':<nonce>:' + \
hex('<httpmethod>:<uri>'))
Where <username> is the username provided by the client, <realm> is the security realm provided by the server, <password> is the password provided by the client, <nonce> is a unique value generated for the request (a uuid is a good start), <httpmethod> is rather obviously the request method, and uri is the request uri.
For example:
hex(hex('jbloggs:name@domain.com:mypassword') + \
':01f6e3c179c05ac9a729afe4ce9107a7:' + hex('GET:/somepath'))
There’s slightly more to it than that, but it is the basics.
Perhaps not quite a multi-layer sponge then….
Well I did say sort of.
Yet another version of stomp.py can be found below.
After some consideration, I thought better of the one-listener-per-subscription method, including instead, addlistener & dellistener methods in the Connection class — thus all listeners receive all subscription messages. Otherwise, the question becomes how to handle acknowledgements? Does the first listener subscribe-call set the acknowledgement? In which case, what if the initial subscriber sets ack: auto and latter subscribers want client? Too much complexity, and, at least to me, this way is cleaner (and by cleaner, I mean it’s easier to determine ‘functional responsibility’).
One problem this change has exposed, is a bug in unsubscribe. I don’t (yet) know if it’s a problem in my module, in activemq/stomp, or it’s just one of those you-shouldn’t-be-coding-so-late-in-the-evening issues that suddenly vanishes when you try it again the next day. But I’ve tried an unsubscribe in a direct telnet session and (annoyingly) still receive messages to that destination, so evidence begins to point toward other sources. I’ll investigate further…
UPDATE 06/12/05: confirmed on a separate machine (& instance of activemq) that unsubscribe doesn’t work as expected.
Yet another update of my stomp module.
This ‘release‘ tidies the code up a bit, adds some documentation, and fixes the command-line test code.
To get help for the module:
import stomp
help(stomp)
You can run from the command line, to test with a simple client. For example:
$ python stomp.py localhost 62003
CONNECTED
session:ID:muttley-32792-1133582529150-3:0
CONNECTED
session:ID:muttley-32792-1133582529150-3:1
subscribe /queue/a
send /queue/a hello
MESSAGE
message-id:ID:muttley-32792-1133582529150-2:0
destination:/queue/a
hello
END@
Or use in another script:
import stomp
st = stomp.Connection('localhost', 62003, 'test', 'test')
st.send('/queue/b', 'test2')
st.send('/queue/a', 'test1')
st.disconnect()
UPDATE 04/12/2005: Minor bug fix, and changed the read method to buffer 1024 chars rather than read 1 by 1
Here is an update of the stomp module, adding support for transactions (begin, abort, commit and the transaction header on send).
I remain unconvinced that this is the right way to do things, but at it does seem to work at least. I’ll know for sure when I try it out in anger next week.
Footnote: the disconnect method sets the running variable to None (to break the loop in thread run()), and sends a disconnect command through each sock connection. Interestingly it worked last night, but this morning it hung the python vm with 100% cpu. Adding a socket close (which should’ve been there anyway) fixes the problem.
Long story.
Evaluating WebSphere Community Edition for a project, and discovered some rather unfortunate flaws, which may mean it won’t be useable for said project. Came across ActiveMQ which is bundled with WebSphereCE (actually bundled with Geronimo, the source of CE, but that makes it an even longer story), and began investigating that on its own, thinking that ActiveMQ plugged into a servlet container, or even Spring may fit the bill. ActiveMQ comes with a simple text-over-the-wire protocol called Stomp which languages other than Java can use to communicate with the broker, and there’s a Python module available — so I figure that’s a good way to start playing with the application. Slight problem — the py module for stomp doesn’t work properly. Rather than investigate the code and try to figure out why (actually I believe I know why… I just don’t want to fix it), with somewhat less than infinite wisdom, I figure I’ll waste an hour or so, one evening, writing my own module. 2.5 evenings later, starting with the socket module… moving to asyncore… then asynchat… then select… then poll… I’m back to a socket-version (a ‘read’ and a ‘write’ socket to be exact) with a one second timeout on the read, and a serious headache with Python’s various socket libs.
The main difficulty is the lack of examples using async[whatever] for a client-side module. 99% of the examples discuss the async modules from the point of view of a server-side application — the final 1% don’t go into enough detail to glean the right way to do things (asyncore.loop() and threading… don’t get me started).
The documentation for select and poll frankly just doesn’t cut it — and one starts to get seriously annoyed with both when they return a file-descriptor ready for read which you discover has nothing available to read.
I’m fairly positive there’s a big piece of the puzzle I’m just missing — but I’m damned if I know where I’m going wrong. And I’m also thinking: 1) I didn’t have this much trouble with Java’s socket IO, and 2) it shouldn’t be this hard.
So here is the beginnings of a stomp protocol module, which can (so far) be used to transmit subscribe and send messages to ActiveMQ. More to come if it doesn’t drive me nutty first…
Addendum: Use the following xml configuration with ActiveMQ to test:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE beans PUBLIC "-//ACTIVEMQ//DTD//EN"
"http://activemq.org/dtd/activemq.dtd">
<beans>
<broker name="receiver">
<connector>
<serverTransport uri="stomp://localhost:62003"/>
</connector>
<persistence>
<vmPersistence/>
</persistence>
</broker>
</beans>
A python version of my jython-based REST test script, built to run against the BaseHTTP… module was a little more work than circumstances might have indicated. The main issues are taking into account the differences between servlets and python http handlers — even now the tester isn’t complete because I haven’t found a -good- xml schema validator for python. There’s the w3 version (which is available online and has python source), but the download requirements were more onerous than I could be bothered with at the time.
I have yet to step back, and take a look at this code to see if it works outside of the very contrived conditions of the environment I’m currently playing with, but it is presented below as, very much, a work-in-progress:
The optimal usage of BaseHTTPRequestHandler, in BaseHTTPServer, seems a little ‘non-obvious’ to me, particularly coming from the Java world. For example, creating an http server instance:
httpd = BaseHTTPServer(server_address, BaseHTTPServer.BaseHTTPRequestHandler)
httpd.serve_forever()
This suggests a single handler class per server — which is the bit I find less than useful. Taking a jetty server as a comparison, you would create a context with a path, then add a number of handlers to that context. Requests (at least as I understand the process) propogate up through the chain of handlers until they have been successfully processed. Within a servlet handler again you have the concept of a handler (servlet) mapped against a path (or paths).
In any case, I’ve created a relatively simple multiple handler mechanism built on the BaseHTTPServer/RequestHandler foundation (source code is below) — however, if anyone in the ‘know’ happens to come across this perhaps you’d like to educate me about what I’ve done wrong…? (since prior experience with Python suggests, if it’s an obvious usage, then there’s probably already a simple way to do it).
With the addition of xml validation and value checking, the testing scripts actually become somewhat useful:
> PUT http://localhost:8080/ws/user '' \
'username=testuser1&email_address=testuser1@test.com'
> GET http://localhost:8080/ws/user/testuser1
< assert status 200
< assert header Content-Type text/xml
< assert xmlschema ../web/user.xsd
< assert xmlbody /user/username=testuser1
< assert xmlbody /user/email_address=testuser1@test.com
> DELETE http://localhost:8080/ws/user/testuser1
Depak Vohra’s article at ONJava, shows how to perform schema validation using xerces/jaxp. Adapting his code to jython is straightforward, by adding the following imports to the test script from part 2:
from java.io import StringReader
from java.lang import System
from javax.xml.parsers import DocumentBuilderFactory
from javax.xml.parsers import DocumentBuilder
from org.xml.sax.helpers import DefaultHandler
from org.xml.sax import InputSource
System.setProperty('javax.xml.parsers.DocumentBuilderFactory', \
'org.apache.xerces.jaxp.DocumentBuilderFactoryImpl')
Depak’s Validator becomes:
class Validator(DefaultHandler):
def __init__(self):
self.validationError = 0
self.saxParseException = None
def error(self, exception):
self.validationError = 1
self.saxParseException = exception
def fatalError(self, exception):
self.validationError = 1
self.saxParseException = exception
def warning(self, exception):
pass
Finally, a new assertion method is added to the Test class:
#
# assert that an xml document validates against an xml schema
#
def assert_xmlschema(self, linearr, req, res):
xml = res.getContent()
factory = DocumentBuilderFactory.newInstance()
factory.setNamespaceAware(1)
factory.setValidating(1)
factory.setAttribute('http://java.sun.com/xml/jaxp/properties/schemaLanguage', \
'http://www.w3.org/2001/XMLSchema')
errhandler = Validator()
factory.setAttribute('http://java.sun.com/xml/jaxp/properties/schemaSource', \
linearr[0])
builder = factory.newDocumentBuilder()
builder.setErrorHandler(errhandler)
dom = builder.parse(InputSource(StringReader(xml)))
if errhandler.validationError == 1:
errmsg = '%s, %s' % (str(errhandler.validationError), \
errhandler.saxParseException.getMessage())
else:
errmsg = ''
assert errhandler.validationError != 1, errmsg
Value checking is accomplished with the following method:
#
# assert that a particular xml element contains a particular value
#
def assert_xmlbody(self, linearr, req, res):
xml = res.getContent()
if not self.xml or self.xml != xml:
factory = DocumentBuilderFactory.newInstance()
factory.setNamespaceAware(1)
builder = factory.newDocumentBuilder()
self.dom = builder.parse(InputSource(StringReader(xml)))
self.xml = xml
# get rid of the starting '/'
# (could leave this out in the test, but it looks better in to me)
if linearr[0].startswith('/'):
linearr[0] = linearr[0][1:]
# split into xml element list and a value based on the first '=' position
equalspos = linearr[0].find('=')
tmp = linearr[0][0:equalspos]
value = linearr[0][equalspos+1:]
elements = tmp.split('/')
# loop through the elements in the list and retrieve each named node
node = self.dom
for x in xrange(0, len(elements)):
nl = node.getElementsByTagName(elements[x])
if not nl:
raise AssertionError, 'sub element "%s" is missing' % elements[x]
else:
node = nl.item(0)
if x == len(elements)-1:
eval = node.getFirstChild().getNodeValue()
assert eval == value, 'element %s (value "%s") does not match expected %s' \
% (elements[x], eval, value)
return
raise AssertionError, 'missing element %s' % tmp
Which takes an assertion line such as “assert xmlbody /user/username=testuser1″ and looks up elements in the dom, where “user” is the root node, and “username” is a child node of user. At the moment this is very basic value checking which won’t work well for more complicated documents — but I’m trying to avoid complication as much as possible anyway, so it’s difficult to justify (to myself at least) going to greater lengths to make something more flexible. The main enhancement I’m looking to make initially is the ability to specify the nth child of a node. For example, something like the string “/list/user[2]/username” (in case it’s not painfully obvious, the second “user” element in “list”).
But for the moment, this will hopefully get me going.
There are a number of transparent/reverse proxy applications available, but I’ve thus far been unable to find one suitable for my needs. They’re either too low level (sniffing TCP packets for example), or intended for other purposes; and as a consequence either don’t present the right info in the right format, or just too gosh-dratted [sic] complicated for me to bother figuring out. Thus, in true lazy-programmer fashion, I’ve written my own quick-and-dirty tool for the job (actually when I say written, I really mean thrown together a few lines of code with some python modules which are doing the real work).
logproxy.py takes a hostname as its argument and redirects all requests to that host, in the meantime logging the headers and content body (if content has been sent). The response is also logged, and then sent back to the client accordingly.