johnmahugu

python - various snippets collection in a single paste

Jun 28th, 2015
1,849
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 204.28 KB | None | 0 0
  1. Python snippets collection
  2.  
  3. This page contains a bunch of miscellaneous Python code snippets, recipes, mini-guides, links, examples, tutorials and ideas, ranging from very (very) basic things to advanced. I hope they will be usefull to you. All snippets are kept in a single page so that you can easily ❶save it for offline reading (and keep on a USB key) ❷search in it.
  4. Note that scripts that do some web-scraping may not work anymore due to website changes. The web is an evolving entity :-)
  5.  
  6. * Send a file using FTP
  7. * Queues (FIFO) and stacks (LIFO)
  8. * A function which returns several values
  9. * Exchanging the content of 2 variables
  10. * Getting rid of duplicate items in a list
  11. * Get all links in a web page (1)
  12. * Get all links in a web page (2)
  13. * Get all links in a web page (3)
  14. * Get all links in a web page (4)
  15. * Zipping/unzipping files
  16. * Listing the content of a directory
  17. * A webserver in 3 lines of code
  18. * Creating and raising your own exceptions
  19. * Scripting Microsoft SQL Server with Python
  20. * Accessing a database with ODBC
  21. * Accessing a database with ADO
  22. * CGI under Windows with TinyWeb
  23. * Creating .exe files from Python programs
  24. * Reading Windows registry
  25. * Measuring the performance of Python programs
  26. * Speed up your Python programs
  27. * Regular expressions are sometimes overkill
  28. * Executing another Python program
  29. * Bayesian filtering
  30. * Tkinter and cx_Freeze
  31. * a few Tkinter tips
  32. * Tkinter file dialogs
  33. * including binaries in your sources
  34. * Good practice: try/except non-standard import statements
  35. * Good practice: Readable objects
  36. * Good practice: No blank-check read()
  37. * 1.7 is different than 1.7 ?
  38. * Get user's home directory path
  39. * Python's virtual machine
  40. * SQLite - database made simple
  41. * Dive into Python
  42. * Creating a mutex under Windows
  43. * urllib2 and proxies
  44. * A proper User-agent in your HTTP requests
  45. * Error handling with urllib2
  46. * urllib2: What am I getting ?
  47. * Reading (and writing) large XLS (Excel) files
  48. * Saving the stack trace
  49. * Filtering out warnings
  50. * Saving an image as progressive JPEG with PIL
  51. * Charsets and encoding
  52. * Iterating
  53. * Parsing the command-line
  54. * Using AutoIt from Python
  55. * What's in a main
  56. * Disable all javascript in a html page
  57. * Multiplying
  58. * Creating and reading .tar.bz2 archives
  59. * Enumerating
  60. * Zip that thing
  61. * A Tkinter widgets which expands in grid
  62. * Convert a string date to a datetime object
  63. * Compute the difference between two dates, in seconds
  64. * Managed attributes, read-only attributes
  65. * First day of the month
  66. * Fetch, read and parse a RSS 2.0 feed in 6 lines
  67. * Get a login from BugMeNot
  68. * Logging into a site and handling session cookies
  69. * Searching on Google
  70. * Building a basic GUI application step-by-step in Python with Tkinter and wxPython
  71. * Flatten nested lists and tuples
  72. * Efficiently iterating over large tables in databases
  73. * A range of floats
  74. * Converting RGB to HSL and back
  75. * Generate a palette of rainbow-like pastel colors
  76. * Columns to rows (and vice-versa)
  77. * How do I create an abstract class in Python ?
  78. * matplotlib, PIL, transparent PNG/GIF and conversions between ARGB to RGBA
  79. * Automatically crop an image
  80. * Counting the different words
  81. * Quick code coverage
  82. * Trapping exceptions to the console under wxPython
  83. * Get a random "interesting" image from Flickr
  84. * Why is Python a good beginner language ?
  85. * Reading LDIF files
  86. * Capture the output of a program
  87. * Writing your own webserver
  88. * SOAP clients
  89. * Archive your whole GMail box
  90. * Performing a HTTP POST requests
  91. * Read a file with line numbers
  92. * Filter all but authorized characters in a string
  93. * Writing your own webserver (using web.py)
  94. * XML-RPC: Simple remote method call
  95. * Signing data
  96. * Week of the year
  97. * Stripping HTML tags
  98. * Decode HTML entities to Unicode characters
  99. * Stripping accented characters
  100. * A dictionnary-like object for LARGE datasets
  101. * Renaming .ogg files according to tags
  102. * Reading configuration (.ini) files
  103. * miniMusic - a minimalist music server
  104. * FTP through a HTTP proxy
  105. * A simple web dispatcher
  106. * Separating GUI and processing
  107. * Separating GUI and processing, part 2 : Accessing common ressources
  108. * Path of current script
  109. * Get current public IP address
  110. * Bypassing aggressive HTTP proxy-caches
  111. * Make sure the script is run as root
  112. * Automated screenshots via crontab
  113. * External links
  114.  
  115.  
  116. Send a file using FTP
  117.  
  118. Piece of cake.
  119. import ftplib # We import the FTP module
  120. session = ftplib.FTP('myserver.com','login','passord') # Connect to the FTP server
  121. myfile = open('toto.txt','rb') # Open the file to send
  122. session.storbinary('STOR toto.txt', myfile) # Send the file
  123. myfile.close() # Close the file
  124. session.quit() # Close FTP session
  125.  
  126.  
  127.  
  128. Queues (FIFO) and stacks (LIFO)
  129.  
  130. Python makes using queues and stacks a piece of cake (Did I already say "piece of cake" ?).
  131. No use creating a specific class: simply use list objects.
  132.  
  133. For a stack (LIFO), stack with append() and destack with pop():
  134. >>> a = [5,8,9]
  135. >>> a.append(11)
  136. >>> a
  137. [5, 8, 9, 11]
  138. >>> a.pop()
  139. 11
  140. >>> a.pop()
  141. 9
  142. >>> a
  143. [5, 8]
  144. >>>
  145.  
  146.  
  147. For a queue (FIFO), enqueue with append() and dequeue with pop(0):
  148. >>> a = [5,8,9]
  149. >>> a.append(11)
  150. >>> a
  151. [5, 8, 9, 11]
  152. >>> a.pop(0)
  153. 5
  154. >>> a.pop(0)
  155. 8
  156. >>> a
  157. [9, 11]
  158.  
  159.  
  160. As lists can contain any type of object, you an create queues and stacks of any type of objects !
  161.  
  162. (Note that there is also a Queue module, but it is mainly usefull with threads.)
  163.  
  164.  
  165. A function which returns several values
  166.  
  167. When you're not accustomed with Python, it's easy to forget that a function can return just any type of object, including tuples.
  168. This a great to create functions which return several values. This is typically the kind of thing that cannot be done in other languages without some code overhead.
  169. >>> def myfunction(a):
  170. return (a+1,a*2,a*a)
  171. >>> print myfunction(3)
  172. (4, 6, 9)
  173.  
  174. You can also use mutiple assignment:
  175. >>> (a,b,c) = myfunction(3)
  176. >>> print b
  177. 6
  178. >>> print c
  179. 9
  180.  
  181. And of course your functions can return any combination/composition of objects (strings, integer, lists, tuples, dictionnaries, list of tuples, etc.).
  182.  
  183.  
  184. Exchanging the content of 2 variables
  185.  
  186. In most languages, exchanging the content of two variable involves using a temporary variable.
  187.  
  188. In Python, this can be done with multiple assignment.
  189. >>> a=3
  190. >>> b=7
  191. >>> (a,b)=(b,a)
  192. >>> print a
  193. 7
  194. >>> print b
  195. 3
  196.  
  197. In Python, tuples, lists and dictionnaries are your friends, really !
  198.  
  199. Highly recommended reading: Dive into Python (http://diveintopython.org/). The first chapter contains a nice tutorial on tuples, lists and dictionnaries. And don't forget to read the rest of the book (You can download the entire book for free).
  200.  
  201.  
  202. Getting rid of duplicate items in a list
  203.  
  204. The trick is to temporarly convert the list in into a dictionnary:
  205. >>> mylist = [3,5,8,5,3,12]
  206. >>> print dict().fromkeys(mylist).keys()
  207. [8, 3, 12, 5]
  208. >>>
  209.  
  210. Since Python 2.5, you can also use sets:
  211. >>> mylist = [3,5,8,5,3,12]
  212. >>> print list(set(mylist))
  213. [8, 3, 12, 5]
  214. >>>
  215.  
  216.  
  217.  
  218. Get all links in a web page (1)
  219.  
  220. ... or regular expression marvels.
  221. import re, urllib
  222. htmlSource = urllib.urlopen("http://sebsauvage.net/index.html").read(200000)
  223. linksList = re.findall('<a href=(.*?)>.*?</a>',htmlSource)
  224. for link in linksList:
  225. print link
  226.  
  227.  
  228.  
  229. Get all links in a web page (2)
  230.  
  231. You can also use the HTMLParser module.
  232. import HTMLParser, urllib
  233.  
  234. class linkParser(HTMLParser.HTMLParser):
  235. def __init__(self):
  236. HTMLParser.HTMLParser.__init__(self)
  237. self.links = []
  238. def handle_starttag(self, tag, attrs):
  239. if tag=='a':
  240. self.links.append(dict(attrs)['href'])
  241.  
  242. htmlSource = urllib.urlopen("http://sebsauvage.net/index.html").read(200000)
  243. p = linkParser()
  244. p.feed(htmlSource)
  245. for link in p.links:
  246. print link
  247.  
  248.  
  249. For each HTML start tag encountered, the handle_starttag() method will be called.
  250. For example <a href="http://google.com> will trigger the method handle_starttag(self,'A',[('href','http://google.com')]).
  251.  
  252. See also all others handle_*() methods in Pyhon manual.
  253.  
  254. (Note that HTMLParser is not bullet-proof: it will choke on ill-formed HTML. In this case, use the sgmllib module, go back to regular expressions or use BeautifulSoup.)
  255.  
  256.  
  257. Get all links in a web page (3)
  258.  
  259. Still hungry ?
  260.  
  261. Beautiful Soup is a Python module which is quite good at extracting data from HTML.
  262. Beautiful Soup's main advantages are its ability to handle very bad HTML code and its simplicity. Its drawback is its speed (it's slow).
  263. You can get it from http://www.crummy.com/software/BeautifulSoup/
  264.  
  265. import urllib
  266. import BeautifulSoup
  267.  
  268. htmlSource = urllib.urlopen("http://sebsauvage.net/index.html").read(200000)
  269. soup = BeautifulSoup.BeautifulSoup(htmlSource)
  270. for item in soup.fetch('a'):
  271. print item['href']
  272.  
  273.  
  274. Get all links in a web page (4)
  275.  
  276. Still there ?
  277. Ok, here's another one:
  278.  
  279. Look ma ! No parser nor regex.
  280. import urllib
  281.  
  282. htmlSource = urllib.urlopen("http://sebsauvage.net/index.html").read(200000)
  283. for chunk in htmlSource.lower().split('href=')[1:]:
  284. indexes = [i for i in [chunk.find('"',1),chunk.find('>'),chunk.find(' ')] if i>-1]
  285. print chunk[:min(indexes)]
  286.  
  287. Granted, this is a crude hack.
  288. But it works !
  289.  
  290.  
  291.  
  292. Zipping/unzipping files
  293.  
  294. Zipping a file:
  295. import zipfile
  296. f = zipfile.ZipFile('archive.zip','w',zipfile.ZIP_DEFLATED)
  297. f.write('file_to_add.py')
  298. f.close()
  299.  
  300. Replace 'w' with 'a' to add files to the zip archive.
  301.  
  302. Unzipping all files from a zip archive:
  303. import zipfile
  304. zfile = zipfile.ZipFile('archive.zip','r')
  305. for filename in zfile.namelist():
  306. data = zfile.read(filename)
  307. file = open(filename, 'w+b')
  308. file.write(data)
  309. file.close()
  310.  
  311. If you want to zip all file in a directory recursively (all subdirectories):
  312.  
  313. import zipfile
  314. f = zipfile.ZipFile('archive.zip','w',zipfile.ZIP_DEFLATED)
  315. startdir = "c:\\mydirectory"
  316. for dirpath, dirnames, filenames in os.walk(startdir):
  317. for filename in filenames:
  318. f.write(os.path.join(dirpath,filename))
  319. f.close()
  320.  
  321.  
  322.  
  323. Listing the content of a directory
  324.  
  325. You have 4 ways of doing this, depending on your need.
  326.  
  327. The listdir() method returns the list of all files in a directory:
  328. import os
  329. for filename in os.listdir(r'c:\windows'):
  330. print filename
  331.  
  332. Note that you can use the fnmatch() module to filter file names.
  333.  
  334. The glob module wraps listdir() and fnmatch() into a single method:
  335. import glob
  336. for filename in glob.glob(r'c:\windows\*.exe'):
  337. print filename
  338.  
  339. And if you need to collect subdirectories, use os.path.walk():
  340. import os.path
  341. def processDirectory ( args, dirname, filenames ):
  342. print 'Directory',dirname
  343. for filename in filenames:
  344. print ' File',filename
  345.  
  346. os.path.walk(r'c:\windows', processDirectory, None )
  347.  
  348. os.path.walk() works with a callback: processDirectory() will be called for each directory encountered.
  349. dirname will contain the path of the directory.
  350. filenames will contain a list of filenames in this directory.
  351.  
  352. You can also use os.walk(), which works in a non-recursive way and is somewhat easier to understand.
  353.  
  354.  
  355. import os
  356. for dirpath, dirnames, filenames in os.walk('c:\\winnt'):
  357. print 'Directory', dirpath
  358. for filename in filenames:
  359. print ' File', filename
  360.  
  361.  
  362.  
  363. A webserver in 3 lines of code
  364.  
  365. import BaseHTTPServer, SimpleHTTPServer
  366. server = BaseHTTPServer.HTTPServer(('',80),SimpleHTTPServer.SimpleHTTPRequestHandler)
  367. server.serve_forever()
  368.  
  369. This webserver will serve files in the current directory. You can use os.chdir() to change the directory.
  370. This trick is handy to serve or transfer files between computers on a local network.
  371.  
  372. Note that this webserver is pretty fast, but can only serve one HTTP request at time. It's not recommended for high-traffic servers.
  373. If you want better performance, have a look at asynchronous sockets (asyncore, Medusa...) or multi-thread webservers.
  374.  
  375.  
  376. Creating and raising your own exceptions
  377.  
  378. Do not consider exception as nasty things which want to break you programs. Exceptions are you friend. Exceptions are a Good Thing. Exceptions are messengers which tell you that something's wrong, and what is wrong. And try/except blocks will give you the chance to handle the problem.
  379.  
  380. In your programs, you should also try/catch all calls that may fall into error (file access, network connections...).
  381.  
  382. It's often usefull to define your own exceptions to signal errors specific to your class/module.
  383.  
  384. Here's an example of defining an exception and a class (say in myclass.py):
  385. class myexception(Exception):
  386. pass
  387.  
  388. class myclass:
  389. def __init__(self):
  390. pass
  391. def dosomething(self,i):
  392. if i<0:
  393. raise myexception, 'You made a mistake !'
  394.  
  395. (myexception is a no-brainer exception: it contains nothing. Yet, it is usefull because the exception itself is a message.)
  396.  
  397. If you use the class, you could do:
  398. import myclass
  399. myobject = myclass.myclass()
  400. myobject.dosomething(-2)
  401.  
  402. If you execute this program, you will get:
  403. Traceback (most recent call last):
  404. File "a.py", line 3, in ?
  405. myobject.dosomething(-2)
  406. File "myclass.py", line 9, in dosomething
  407. raise myexception, 'You made a mistake !'
  408. myclass.myexception: You made a mistake !
  409.  
  410. myclass tells you you did something wrong. So you'd better try/catch, just in case there's a problem:
  411. import myclass
  412. myobject = myclass.myclass()
  413. try:
  414. myobject.dosomething(-2)
  415. except myclass.myexception:
  416. print 'oops ! myclass tells me I did something wrong.'
  417.  
  418. This is better ! You have a chance to do something if there's a problem.
  419.  
  420.  
  421. Scripting Microsoft SQL Server with Python
  422.  
  423. If you have Microsoft SQL Server, you must have encountered this situation where you tell yourself «If only I was able to script all those clicks in Enterprise Manager (aka the MMC) !».
  424.  
  425. You can ! It's possible to script in Python whatever you can do in the MMC.
  426.  
  427. You just need the win32all python module to access COM objects from within Python (see http://starship.python.net/crew/mhammond/win32/)
  428. (The win32all module is also provided with ActiveState's Python distribution: http://www.activestate.com/Products/ActivePython/)
  429.  
  430. Once installed, just use the SQL-DMO objects.
  431.  
  432. For example, get the list of databases in a server:
  433. from win32com.client import gencache
  434. s = gencache.EnsureDispatch('SQLDMO.SQLServer')
  435. s.Connect('servername','login','password')
  436. for i in range(1,s.Databases.Count):
  437. print s.Databases.Item(i).Name
  438.  
  439. Or get the script of a table:
  440. database = s.Databases('COMMERCE')
  441. script = database.Tables('CLIENTS').Script()
  442. print script
  443.  
  444. You will find the SQL-DMO documentation in MSDN:
  445.  
  446. * http://msdn.microsoft.com/library/en-us/sqldmo/dmoref_ob_s_7igk.asp
  447. * http://msdn.microsoft.com/library/en-us/sqldmo/dmoref_ob_3tlx.asp
  448.  
  449.  
  450.  
  451. Accessing a database with ODBC
  452.  
  453. Under Windows, ODBC provides an easy way to access almost any database. It's not very fast, but it's ok.
  454.  
  455. You need the win32all python module.
  456.  
  457. First, create a DSN (for example: 'mydsn'), then:
  458. import dbi, odbc
  459. conn = odbc.odbc('mydsn/login/password')
  460. c = conn.cursor()
  461. c.execute('select clientid, name, city from client')
  462. print c.fetchall()
  463.  
  464. Nice and easy !
  465. You can also use fetchone() or fetchmany(n) to fetch - respectively - one or n rows at once.
  466.  
  467. Note : On big datasets, I have quite bizarre and unregular data truncations on tables with a high number of columns. Is that a bug in ODBC, or in the SQL Server ODBC driver ? I will have to investigate...
  468.  
  469.  
  470. Accessing a database with ADO
  471.  
  472. Under Windows, you can also use ADO (Microsoft ActiveX Data Objects) instead of ODBC to access databases. The following code uses ADO COM objects to connect to a Microsoft SQL Server database, retreive and display a table.
  473. import win32com.client
  474. connexion = win32com.client.gencache.EnsureDispatch('ADODB.Connection')
  475. connexion.Open("Provider='SQLOLEDB';Data Source='myserver';Initial Catalog='mydatabase';User ID='mylogin';Password='mypassword';")
  476. recordset = connexion.Execute('SELECT clientid, clientName FROM clients')[0]
  477. while not recordset.EOF:
  478. print 'clientid=',recordset.Fields(0).Value,' client name=',recordset.Fields(1).Value
  479. recordset.MoveNext()
  480. connexion.Close()
  481.  
  482. For ADO documentation, see MSDN: http://msdn.microsoft.com/library/en-us/ado270/htm/mdmscadoobjects.asp
  483.  
  484.  
  485.  
  486. CGI under Windows with TinyWeb
  487.  
  488. TinyWeb is a one-file webserver for Windows (the exe is only 53 kb). It's fantastic for making instant webservers and share files. TinyWeb is also capable of serving CGI.
  489.  
  490. Let's have some fun and create some CGI with Python !
  491.  
  492. First, let's get and install TinyWeb:
  493.  
  494. 1. Get TinyWeb from http://www.ritlabs.com/tinyweb/ (it's free, even for commercial use !) and unzip it to c:\somedirectory (or any directory you'd like).
  495. 2. Create the "www" subdirectory in this directory
  496. 3. Create index.html in the www directory:
  497. <html><body>Hello, world !</body></html>
  498. 4. Run the server: tiny.exe c:\somedirectory\www
  499. (make sure you use an absolute path)
  500. 5. Point your browser at http://localhost
  501.  
  502. If you see "Hello, world !", it means that TinyWeb is up and running.
  503.  
  504. Let's start making some CGI:
  505.  
  506. 1. In the www directory, create the "cgi-bin" subdirectory.
  507. 2. Create hello.py containing:
  508. print "Content-type: text/html"
  509. print
  510. print "Hello, this is Python talking !"
  511. 3. Make sure Windows always uses python.exe when you double-clic .py files.
  512. (SHIFT+rightclick on a .py file, "Open with...", choose python.exe,
  513. check the box "Always use this program...", click Ok)
  514. 4. Point your browser at http://localhost/cgi-bin/hello.py
  515.  
  516. You should see "Hello, this is Python talking !" (and not the source code).
  517. If it's ok, you're done !
  518. Now you can make some nice CGI.
  519.  
  520. (If this does not work, make sure the path to python.exe is ok and that you used an absolute path in tinyweb's command line.)
  521.  
  522. Note that this will never be as fast as mod_python under Apache (because TinyWeb will spawn a new instance of the Python interpreter for each request on a Python CGI). Thus it's not appropriate for high-traffic production servers, but for a small LAN, it can be quite handy to serve CGI like this.
  523.  
  524. Refer to Python documentation for CGI tutorials and reference.
  525.  
  526. * Hint 1: Don't forget that you can also use TinySSL, which is the SSL/HTTPS enabled version of TinyWeb. That's fantastic for making secure webservers (especially to prevent LAN sniffing, when authentication is required).
  527. * Hint 2: If you wrap your Python CGI with py2exe, you'll be able to run your CGI on computers where Python is not installed.
  528. Sub-hint: Compress all exe/dll/pyd with UPX, and you can take the whole webserver and its CGI on a floppy disk and run it everywhere ! (A typical "Hello, world !" CGI example and TinyWeb weight together only 375 kb with Python 2.2 !)
  529. * Hint 3: When serving files (not CGI), TinyWeb uses Windows file extensionContent-type mapping (like .zip = application/x-zip-compressed). If you find that Content-type is wrong, you can correct using the following file: tinyweb.reg.
  530. * Hint 4: Under Windows there is a trick to send binary files correctly in CGI: You need to change stdout mode from text mode to binary mode. This is required on Windows only:
  531. import sys
  532. if sys.platform == "win32":
  533. import os, msvcrt
  534. msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
  535. (code taken from http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/65443/ )
  536.  
  537.  
  538.  
  539. Creating .exe files from Python programs
  540.  
  541. Like Sun's Java or Microsoft's .Net, if you want to distribute your Python programs, you need to bundle the virtual machine too.
  542. You have several options: py2exe, cx_Freeze or pyInstaller.
  543.  
  544. py2exe
  545. py2exe provides an easy way to gather all necessary files to distribute your Python program on computers where Python is not installed.
  546. For example, under Windows, if you want to transform myprogram.py into myprogram.exe, create the file setup.py as follows:
  547. from distutils.core import setup
  548. import py2exe
  549. setup(name="myprogram",scripts=["myprogram.py"],)
  550.  
  551. Then run:
  552. python setup.py py2exe
  553.  
  554. py2exe will get all dependant files and write them in the \dist subdirectory. You will typically find your program as .exe, pythonXX.dll and complementary .pyd files. Your program will run on any computer even if Python is not installed. This also works for CGI.
  555. (Note that if your program uses tkinter, there is a trick.)
  556.  
  557. Hint : Use UPX to compress all dll/exe/pyd files. This will greatly reduce file size. Use: upx --best *.dll *.exe *.pyd (Typically, python22.dll shrinks from 848 kb to 324 kb.)
  558. Note that since version 0.6.1, py2exe is capable of creating a single EXE (pythonXX.dll and other files are integrated into the EXE).
  559. #!/usr/bin/python
  560. # -*- coding: iso-8859-1 -*-
  561. from distutils.core import setup
  562. import py2exe
  563.  
  564. setup(
  565. options = {"py2exe": {"compressed": 1, "optimize": 0, "bundle_files": 1, } },
  566. zipfile = None,
  567. console=["myprogram.py"]
  568. )
  569.  
  570.  
  571.  
  572. cx_Freeze
  573.  
  574. You can also use cx_Freeze, which is an alternative to py2exe (This is what I used in webGobbler).
  575. cx_Freeze\FreezePython.exe --install-dir bin --target-name=myprogram.exe myprogram.py
  576.  
  577. or even create a console-less version:
  578.  
  579. cx_Freeze\FreezePython.exe --install-dir bin --target-name=myprogram.exe --base-binary=Win32GUI.exe myprogram.py
  580.  
  581.  
  582. Tip for the console-less version: If you try to print anything, you will get a nasty error window, because stdout and stderr do not exist (and the cx_freeze Win32gui.exe stub will display an error Window).
  583. This is a pain when you want your program to be able to run in GUI mode and command-line mode.
  584. To safely disable console output, do as follows at the beginning of your program:
  585.  
  586. try:
  587. sys.stdout.write("\n")
  588. sys.stdout.flush()
  589. except IOError:
  590. class dummyStream:
  591. ''' dummyStream behaves like a stream but does nothing. '''
  592. def __init__(self): pass
  593. def write(self,data): pass
  594. def read(self,data): pass
  595. def flush(self): pass
  596. def close(self): pass
  597. # and now redirect all default streams to this dummyStream:
  598. sys.stdout = dummyStream()
  599. sys.stderr = dummyStream()
  600. sys.stdin = dummyStream()
  601. sys.__stdout__ = dummyStream()
  602. sys.__stderr__ = dummyStream()
  603. sys.__stdin__ = dummyStream()
  604.  
  605. This way, if the program starts in console-less mode, it will work even if the code contains print statements.
  606. And if run in command-line mode, it will print out as usual. (This is basically what I did in webGobbler, too.)
  607.  
  608.  
  609.  
  610. pyInstaller
  611. pyInstaller is the reincarnation of McMillan Installer. It can also create one-file executables.
  612. You can get it from http://pyinstaller.hpcf.upr.edu/cgi-bin/trac.cgi/wiki
  613.  
  614. Unzip pyInstaller in the pyinstaller_1.1 subdirectory, then do:
  615.  
  616. python pyinstaller_1.1\Configure.py
  617. (You only have to do this once.)
  618.  
  619. Then create the .spec file for your program:
  620.  
  621. python pyinstaller_1.1\Makespec.py myprogram.py myprogram.spec
  622.  
  623. Then pack your program:
  624.  
  625. python pyinstaller_1.1\Build.py myprogram.spec
  626.  
  627. You program will be available in the \distmyprogram subdirectory. (myprogram.exe, pythonXX.dll, MSVCR71.dll, etc.)
  628.  
  629. You have several options, such as:
  630.  
  631. * --onefile will create a single EXE file. E.g.:
  632. python pyinstaller_1.1\Makespec.py --onfile myprogram.py myprogram.spec
  633. Note that this EXE, when run, unpacks all files in a temporary directory, runs the unpacked program from there, then deletes all files when finished. You may or may not like this behaviour (I don't).
  634. * --noconsole allows the creation of pure Windows executables (with no console window).
  635. python pyinstaller_1.1\Makespec.py --noconsole myprogram.py myprogram.spec
  636. * --tk is a really nice option of pyInstaller which packs all necessary files for tkinter (tcl/tk).
  637.  
  638.  
  639.  
  640.  
  641. Reading Windows registry
  642.  
  643. import _winreg
  644. key = _winreg.OpenKey(_winreg.HKEY_CURRENT_USER, 'Software\\Microsoft\\Internet Explorer', 0, _winreg.KEY_READ)
  645. (value, valuetype) = _winreg.QueryValueEx(key, 'Download Directory')
  646. print value
  647. print valuetype
  648.  
  649. valuetype is the type of the registry key. See http://docs.python.org/lib/module--winreg.html
  650.  
  651.  
  652.  
  653. Measuring the performance of Python programs
  654.  
  655. Python is provided with a code profiling module: profile. It's rather easy to use.
  656.  
  657. For example, if you want to profile myfunction(), instead of calling it with:
  658. myfunction()
  659.  
  660. you just have to do:
  661. import profile
  662. profile.run('myfunction()','myfunction.profile')
  663. import pstats
  664. pstats.Stats('myfunction.profile').sort_stats('time').print_stats()
  665.  
  666. This will display a report like this:
  667. Thu Jul 03 15:20:26 2003 myfunction.profile
  668.  
  669. 1822 function calls (1792 primitive calls) in 0.737 CPU seconds
  670.  
  671. Ordered by: internal time
  672.  
  673. ncalls tottime percall cumtime percall filename:lineno(function)
  674. 1 0.224 0.224 0.279 0.279 myprogram.py:512(compute)
  675. 10 0.078 0.008 0.078 0.008 myprogram.py:234(first)
  676. 1 0.077 0.077 0.502 0.502 myprogram.py:249(give_first)
  677. 1 0.051 0.051 0.051 0.051 myprogram.py:1315(give_last)
  678. 3 0.043 0.014 0.205 0.068 myprogram.py:107(sort)
  679. 1 0.039 0.039 0.039 0.039 myprogram.py:55(display)
  680. 139 0.034 0.000 0.106 0.001 myprogram.py:239(save)
  681. 139 0.030 0.000 0.072 0.001 myprogram.py:314(load)
  682. ...
  683.  
  684. This report tells you, for each function/method:
  685.  
  686. * how many times it was called (ncalls).
  687. * total time spent in function (minus time spent in sub-functions) (tottime)
  688. * total time spent in function (including time spent in sub-functions) (cumtime)
  689. * average time per call (percall)
  690.  
  691. As you can see, the profile module displays the precise filename, line and function name. This is precious information and will help you to spot the slowest parts of your programs.
  692.  
  693. But don't try to optimize too early in development stage. This is evil ! :-)
  694.  
  695. Note that Python is also provided with a similar module named hotspot, which is more accurate but does not work well with threads.
  696.  
  697.  
  698. Speed up your Python programs
  699.  
  700. To speedup your Python program, there's nothing like optimizing or redesigning your algorithms.
  701.  
  702. In case you think you can't do better, you can always use Psyco: Psyco is a Just-In-Time-like compiler for Python for Intel 80x86-compatible processors. It's very easy to use and provides x2 to x100 instant speed-up.
  703.  
  704. 1. Download psyco for your Python version (http://psyco.sourceforge.net)
  705. 2. unzip and copy the \psyco directory to your Python site-packages directory (should be something like c:\pythonXX\Lib\site-packages\psyco\ under Windows)
  706.  
  707. Then, put this at the beginning of your programs:
  708. import psyco
  709. psyco.full()
  710.  
  711. Or even better:
  712. try:
  713. import psyco
  714. psyco.full()
  715. except:
  716. pass
  717.  
  718. This way, if psyco is installed, your program will run faster.
  719. If psyco is not available, your program will run as usual.
  720.  
  721. (And if psyco is still not enough, you can rewrite the code which is too slow in C or C++ and wrap it with SWIG (http://swig.org).)
  722.  
  723. Note: Do not use Psyco when debugging, profiling or tracing your code. You may get innacurate results and strange behaviours.
  724. Regular expressions are sometimes overkill
  725.  
  726. I helped someone on a forum who wanted process a text file: He wanted to extract the text following "Two words" in all lines starting whith these 2 word. He had started writing a regular expression for this: r = re.compile("Two\sword\s(.*?)").
  727.  
  728. His problem was better solved with:
  729. [...]
  730. for line in file:
  731. if line.startswith("Two words "):
  732. print line[10:]
  733.  
  734. Regular expression are sometime overkill. They are not always the best choice, because:
  735.  
  736. * They involve some overhead:
  737. o You have to compile the regular expression (re.compile()). This means parsing the regular expression and transforming it into a state machine. This consumes CPU time.
  738. o When using the regular expression, you run the state machine against the text, which make the state machine change state according to many rules. This is also eats CPU time.
  739. * Regular expression are not failsafe: they can fail sometimes on specific input. You may get a "maximum recusion limit exceeded" exception. This means that you should also enclose all match(), search() and findall() methods in try/except blocks.
  740. * The Zen of Python (import this :-) says «Readability counts». That's a good thing. And regular expression quickly become difficult to read, debug and change.
  741.  
  742. Besides, string methods like find(), rfind() or startwith() are very fast, much faster than regular expressions.
  743.  
  744. Do not try to use regular expressions everywhere. Often a bunch of string operations will do the job faster.
  745.  
  746.  
  747. Executing another Python program
  748.  
  749. exec("anotherprogram.py")
  750.  
  751.  
  752.  
  753. Bayesian filtering
  754.  
  755. Bayesian filtering is the last buzz-word of spam fighting. And it works very well indeed !
  756.  
  757. Reverend is a free Bayesian module for Python. You can download it from http://divmod.org/trac/wiki/DivmodReverend
  758.  
  759. Here's an example: Recognizing the language of a text.
  760.  
  761. First, train it on a few sentences:
  762. from reverend.thomas import Bayes
  763. guesser = Bayes()
  764. guesser.train('french','La souris est rentrée dans son trou.')
  765. guesser.train('english','my tailor is rich.')
  766. guesser.train('french','Je ne sais pas si je viendrai demain.')
  767. guesser.train('english','I do not plan to update my website soon.')
  768.  
  769. And now let it guess the language:
  770. >>> print guesser.guess('Jumping out of cliffs it not a good idea.')
  771. [('english', 0.99990000000000001), ('french', 9.9999999999988987e-005)]
  772.  
  773. The bayesian filter says: "It's english, with a 99,99% probability."
  774.  
  775. Let's try another one:
  776. >>> print guesser.guess('Demain il fera très probablement chaud.')
  777. [('french', 0.99990000000000001), ('english', 9.9999999999988987e-005)]
  778.  
  779. It says: "It's french, with a 99,99% probability."
  780. Not bad, isn't it ?
  781.  
  782. You can train it on even more languages at the same time. You can also train it to classify any kind of text.
  783.  
  784.  
  785. Tkinter and cx_Freeze
  786.  
  787. (This trick also works with py2exe).
  788.  
  789. Say you want to package a Tkinter Python program with cx_Freeze in order to distribute it.
  790. You create your program:
  791.  
  792. #!/usr/bin/python
  793. # -*- coding: iso-8859-1 -*-
  794. import Tkinter
  795.  
  796. class myApplication:
  797.  
  798. def __init__(self,root):
  799. self.root = root
  800. self.initializeGui()
  801.  
  802. def initializeGui(self):
  803. Tkinter.Label(self.root,text="Hello, world").grid(column=0,row=0)
  804.  
  805. def main():
  806. root = Tkinter.Tk()
  807. root.title('My application')
  808. app = myApplication(root)
  809. root.mainloop()
  810.  
  811. if __name__ == "__main__":
  812. main()
  813.  
  814. This program works on your computer. Now let's package it with cx_Freeeze:
  815.  
  816. FreezePython.exe --install-dir bin --target-name=test.exe test.py
  817.  
  818. If you run your program (test.exe), you will get this error:
  819.  
  820. The dynamic link library tk84.dll could not be found in the specified path [...]
  821.  
  822. In fact, you need to copy the TKinter DLLs. Your builing batch becomes:
  823.  
  824. FreezePython.exe --install-dir bin --target-name=test.exe test.py
  825. copy C:\Python24\DLLs\tcl84.dll .\bin\
  826. copy C:\Python24\DLLs\tk84.dll .\bin\
  827.  
  828. Ok, john, build it again.
  829. Run the EXE: it works !
  830. Run the EXE on another computer (which does not have Python installed): Error !
  831.  
  832. Traceback (most recent call last):
  833. File "cx_Freeze\initscripts\console.py", line 26, in ?
  834. exec code in m.__dict__
  835. File "test.py", line 20, in ?
  836. File "test.py", line 14, in main
  837. File "C:\Python24\Lib\lib-tk\Tkinter.py", line 1569, in __init__
  838. _tkinter.TclError: Can't find a usable init.tcl in the following directories:
  839. [...]
  840.  
  841. Nasty, isn't it ?
  842. The reason it fails is that Tkinter needs the runtime tcl scripts which are located in C:\Python24\tcl\tcl8.4 and C:\Python24\tcl\tk8.4.
  843. So let's copy these scripts in the same directory as you application.
  844.  
  845. You building batch becomes:
  846.  
  847. cx_Freeze\FreezePython.exe --install-dir bin --target-name=test.exe test.py
  848. copy C:\Python24\DLLs\tcl84.dll .\bin\
  849. copy C:\Python24\DLLs\tk84.dll .\bin\
  850. xcopy /S /I /Y "C:\Python24\tcl\tcl8.4\*.*" "bin\libtcltk84\tcl8.4"
  851. xcopy /S /I /Y "C:\Python24\tcl\tk8.4\*.*" "bin\libtcltk84\tk8.4"
  852.  
  853. But you also need to tell your program where to get the tcl/tk runtime scripts (in bold below):
  854.  
  855. #!/usr/bin/python
  856. # -*- coding: iso-8859-1 -*-
  857.  
  858. import os, os.path
  859. # Take the tcl/tk library from local subdirectory if available.
  860. if os.path.isdir('libtcltk84'):
  861. os.environ['TCL_LIBRARY'] = 'libtcltk84\\tcl8.4'
  862. os.environ['TK_LIBRARY'] = 'libtcltk84\\tk8.4'
  863.  
  864. import Tkinter
  865.  
  866. class myApplication:
  867.  
  868. def __init__(self,root):
  869. self.root = root
  870. self.initializeGui()
  871.  
  872. def initializeGui(self):
  873. Tkinter.Label(self.root,text="Hello, world").grid(column=0,row=0)
  874.  
  875. def main():
  876. root = Tkinter.Tk()
  877. root.title('My application')
  878. app = myApplication(root)
  879. root.mainloop()
  880.  
  881. if __name__ == "__main__":
  882. main()
  883.  
  884.  
  885. Now you can properly package and distribute Tkinter-enabled applications. (I used this trick in webGobbler.)
  886.  
  887.  
  888. Possible improvement:
  889.  
  890. You surely could get rid of some tcl/tk script you don't need. Example: bin\libtcltk84\tk8.4\demos (around 500 kb) are only tk demonstrations. They are not necessary.
  891. This depends on which features of Tkinter your program will use.
  892. (cx_Freeze and - AFAIK - all other packagers are not capable of resolving tcl/tk dependencies.)
  893.  
  894.  
  895. A few Tkinter tips
  896.  
  897. Tkinter is the basic GUI toolkit provided with Python.
  898.  
  899. Here's a simple example:
  900.  
  901. import Tkinter
  902.  
  903. class myApplication: #1
  904. def __init__(self,root):
  905. self.root = root #2
  906. self.initialisation() #3
  907.  
  908. def initialisation(self): #3
  909. Tkinter.Label(self.root,text="Hello, world !").grid(column=0,row=0) #4
  910.  
  911. def main(): #5
  912. root = Tkinter.Tk()
  913. root.title('My application')
  914. app = myApplication(root)
  915. root.mainloop()
  916.  
  917. if __name__ == "__main__":
  918. main()
  919.  
  920. #1 : It's always better to code a GUI in the form of a class. It will be easier to reuse your GUI components.
  921.  
  922. #2 : Always keep a reference to your ancestor. You will need it when adding widgets.
  923.  
  924. #3 : Keep the code which creates all the widgets clearly separated from the rest of the code. It will be easier to maintain.
  925.  
  926. #4 : Do not use the .pack(). It's usually messy, and painfull when you want to extend your GUI. grid() lets you place and move your widgets elements easily. Never ever mix .pack() and .grid(), or your application will hang without warning, with 100% CPU usage.
  927.  
  928. #5 : It's always a good idea to have a main() defined. This way, you can test the GUI elements by directly by running the module.
  929.  
  930.  
  931. I lack time, so this list of recommendations could be much larger after my experience with webGobbler.
  932.  
  933.  
  934.  
  935. Tkinter file dialogs
  936.  
  937. Tkinter is provided with several basic dialogs for file or directory handling. There's pretty easy to use, but it's good to have some examples:
  938.  
  939. Select a directory:
  940.  
  941. import Tkinter
  942. import tkFileDialog
  943.  
  944. root = Tkinter.Tk()
  945. directory = tkFileDialog.askdirectory(parent=root,initialdir="/",title='Please select a directory')
  946. if len(directory) > 0:
  947. print "You chose directory %s" % directory
  948.  
  949. Select a file for open (askopenfile will open the file for you. file will behave like a normal file object):
  950.  
  951. import Tkinter
  952. import tkFileDialog
  953.  
  954. root = Tkinter.Tk()
  955. file = tkFileDialog.askopenfile(parent=root,mode='rb',title='Please select a file')
  956. if file != None:
  957. data = file.read()
  958. file.close()
  959. print "I got %d bytes from the file." % len(data)
  960.  
  961. Save as... dialog:
  962.  
  963. import Tkinter
  964. import tkFileDialog
  965.  
  966. myFormats = [
  967. ('Windows Bitmap','*.bmp'),
  968. ('Portable Network Graphics','*.png'),
  969. ('JPEG / JFIF','*.jpg'),
  970. ('CompuServer GIF','*.gif'),
  971. ]
  972.  
  973. root = Tkinter.Tk()
  974. filename = tkFileDialog.asksaveasfilename(parent=root,filetypes=myFormats,title="Save image as...")
  975. if len(filename) > 0:
  976. print "Now saving as %s" % (filename)
  977.  
  978.  
  979.  
  980. Including binaries in your sources
  981.  
  982. Sometime it's handy to include small files in your sources (icons, test files, etc.)
  983.  
  984. Let's take a file (myimage.gif) and convert it in base64 (optionnaly compressing it with zlib):
  985.  
  986. import base64,zlib
  987. data = open('myimage.gif','rb').read()
  988. print base64.encodestring(zlib.compress(data))
  989.  
  990. Get the text created by this program and use it in your source:
  991.  
  992. import base64,zlib
  993. myFile = zlib.decompress(base64.decodestring("""
  994. eJxz93SzsExUZlBn2MzA8P///zNnzvz79+/IgUMTJ05cu2aNaBmDzhIGHj7u58+fO11ksLO3Kyou
  995. ikqIEvLkcYyxV/zJwsgABDogAmQGA8t/gROejlpLMuau+j+1QdQxk20xwzqhslmHH5/xC94Q58ST
  996. 72nRllBw7cUDHZYbL8VtLOYbP/b6LhXB7tAcfPCpHA/fSvcJb1jZWB9c2/3XLmQ+03mZBBP+GOak
  997. /AAZGXPL1BJe39jqjoqEAhFr1fBi1dao9g4Ovjo+lh6GFDVWJqbisLKoCq5p1X5s/Jw9IenrFvUz
  998. +mRXTeviY+4p2sKUflA1cjkX37TKWYwFzRpFYeqTs2fOqEuwXsfgOeGCfmZ57MP4WSpaZ0vSJy97
  999. WPeY5ca8F1sYI5f5r2bjec+67nmaTcarm7+Z0hgY2Z7++fpCzHmBQCrPF94dAi/jj1oZt8R4qxsy
  1000. 6liJX/UVyLjwoHFxFK/VMWbN90rNrLKMGQ7iQSc7mXgTkpwPXVp0mlWz/JVC4NK0s0zcDWkcFxxF
  1001. mrvdlBdOnBySvtNvq8SBFZo8rF2MvAIMoZoPmZrZPj2buEDr2isXi0V8egpelyUvbXNc7yVQkKgS
  1002. sM7g0KOr7kq3WRIkitSuRj1VXbSk8v4zh8fljqtOhyobP91izvh0c2hwqKz3jPaHhvMMXVQspYq8
  1003. aiV9ivkmHri5u2NH8fvPpVWuK65I3OMUX+f4Lee+3Hmfux96Vq5RVqxTN38YeK3wRbVz5v06FSYG
  1004. awWFgMzkktKiVIXkotTEktQUhaRKheDUpMTikszUPIVgx9AwR3dXBZvi1KTixNKyxPRUhcQSBSRe
  1005. Sn6JQl5qiZ2CrkJGSUmBlb4+QlIPKKGgAADBbgMp"""))
  1006.  
  1007. print "I have a file of %d bytes." % len(myFile)
  1008.  
  1009. For example, if you use PIL (Python Imaging Library), you can directly open this image:
  1010.  
  1011. import Image,StringIO
  1012. myimage = Image.open(StringIO.StringIO(myFile))
  1013. myimage.show()
  1014.  
  1015.  
  1016.  
  1017. Good practice: try/except non-standard import statements
  1018.  
  1019. If your program uses modules which are not part of the standard Python distribution, it can be a pain for your users to identify which module are required and where to get them.
  1020.  
  1021. Ease their pain with a simple try/except statement which tells the module name (which is not always the same name as stated in the import statement) and where to get it.
  1022.  
  1023. Example:
  1024. try:
  1025. import win32com.client
  1026. except ImportError:
  1027. raise ImportError, 'This program requires the win32all extensions for Python. See http://starship.python.net/crew/mhammond/win32/'
  1028.  
  1029.  
  1030.  
  1031. Good practice: Readable objects
  1032.  
  1033. Let's define a "client" class. Each client has a name and a number.
  1034.  
  1035. class client:
  1036. def __init__(self,number,name):
  1037. self.number = number
  1038. self.name = name
  1039.  
  1040. Now if we create an instance of this class and if we display it:
  1041.  
  1042. my_client = client(5,"Smith")
  1043. print my_client
  1044.  
  1045. You get:
  1046.  
  1047. <__main__.client instance at 0x007D0E40>
  1048.  
  1049. Quite exact, but not very explicit.
  1050.  
  1051. Let's improve that and add a __repr__ method:
  1052.  
  1053. class client:
  1054. def __init__(self,number,name):
  1055. self.number = number
  1056. self.name = name
  1057. def __repr__(self):
  1058. return '<client id="%s" name="%s">' % (self.number, self.name)
  1059.  
  1060. Let's do it again:
  1061.  
  1062. my_client = client(5,"Smith")
  1063. print my_client
  1064.  
  1065. We get:
  1066.  
  1067. <client id="5" nom="Dupont">
  1068.  
  1069. Ah !
  1070. Much better. Now this object has a meaning to you.
  1071. It's much better for debugging or logging.
  1072.  
  1073.  
  1074. You can even apply this to compound objects, such as a client directory:
  1075.  
  1076. class directory:
  1077.  
  1078. def __init__(self):
  1079. self.clients = []
  1080.  
  1081. def addClient(self, client):
  1082. self.clients.append(client)
  1083.  
  1084. def __repr__(self):
  1085. lines = []
  1086. lines.append("<directory>")
  1087. for client in self.clients:
  1088. lines.append(" "+repr(client))
  1089. lines.append("</directory>")
  1090. return "\n".join(lignes)
  1091.  
  1092. Then create a directory, and add clients to this directory:
  1093.  
  1094. my_directory = directory()
  1095. my_directory.addClient( client(5,"Smith") )
  1096. my_directory.addClient( client(12,"Doe") )
  1097.  
  1098. print my_directory
  1099.  
  1100. You'll get:
  1101.  
  1102. <directory>
  1103. <client id="5" name="Smith">
  1104. <client id="12" name="Doe">
  1105. </directory>
  1106.  
  1107. Much better, isn't it ?
  1108.  
  1109. This trick - which is not exclusive to Python - is handy for debugging or logging.
  1110. For example, if your program goes tits ups, you can log the objects states in a file for debugging purposes in the except clause of a try/except block.
  1111.  
  1112.  
  1113.  
  1114. Good practice: No blank-check read()
  1115.  
  1116. When you read a file or a socket, you often use simply .read(), such as:
  1117.  
  1118. # Read from a file:
  1119. file = open("a_file.dat","rb")
  1120. data = file.read()
  1121. file.close()
  1122.  
  1123. # Read from an URL:
  1124. import urllib
  1125. url = urllib.urlopen("http://sebsauvage.net")
  1126. html = url.read()
  1127. url.close()
  1128.  
  1129.  
  1130. But what happens if the file is 40 Gb, or the website sends data non-stop ?
  1131. You program will eat all the system's memory, slow down to a crawl and probably crash the system too.
  1132.  
  1133. You should always bound your read().
  1134. For example, I do not expect to process files larger than 10 Mb, nor read HTML pages larger than 200 kb, so I would write:
  1135.  
  1136. # Read from a file:
  1137. file = open("a_file.dat","rb")
  1138. data = file.read(10000000)
  1139. file.close()
  1140.  
  1141. # Read from an URL:
  1142. import urllib
  1143. url = urllib.urlopen("http://sebsauvage.net")
  1144. html = url.read(200000)
  1145. url.close()
  1146.  
  1147. This way, I'm safe from buggy or malicious external data sources.
  1148.  
  1149. Always be cautious when manipulating data you have no control over !
  1150.  
  1151. ...er, finally, be also cautious with your own data, too.
  1152. Shit happens.
  1153.  
  1154.  
  1155.  
  1156. 1.7 is different than 1.7 ?
  1157.  
  1158. This is a common pitfall amongst novice programmers:
  1159.  
  1160. Never confuse data and it's representation on screen.
  1161.  
  1162. When you see a floating number 1.7, you only see a textual representation of the binary data stored in computer's memory.
  1163. When you use a date, such as :
  1164.  
  1165. >>> import datetime
  1166. >>> print datetime.datetime.now()
  1167. 2006-03-21 15:23:20.904000
  1168. >>>
  1169.  
  1170. "2006-03-21 15:23:20.904000" is NOT the date. It's a textual representation of the date (The real date is binary data in the computer's memory).
  1171.  
  1172. The print statement seems to be trivial, but it's not. It involves complex work in order to create a human-readable representation of various binary data formats. This is not trivial, even for a simple integer.
  1173.  
  1174.  
  1175. This leads to pitfalls, such as:
  1176.  
  1177. a = 1.7
  1178. b = 0.9 + 0.8 # This should be 1.7
  1179.  
  1180. print a
  1181. print b
  1182.  
  1183. if a == b:
  1184. print "a and b are equal."
  1185. else:
  1186. print "a and b are different !"
  1187.  
  1188. What do you expect this code to print ? "a and b are equal ?".
  1189. You're wrong !
  1190.  
  1191. 1.7
  1192. 1.7
  1193. a and b are different !
  1194.  
  1195. How can this be ?
  1196. How can 1.7 be different than 1.7 ?
  1197.  
  1198. Remember the two "1.7" are just textual representation of numbers, which are almost equal to 1.7.
  1199. The program says they are different because a and b are different at the binary level.
  1200. Only their textual representation is the same.
  1201.  
  1202. Thus for comparing floating numbers, use the following tricks:
  1203.  
  1204. if abs(a-b) < 0.00001:
  1205. print "a and b are equal."
  1206. else:
  1207. print "a and b are different !"
  1208.  
  1209. or even:
  1210.  
  1211. if str(a) == str(b):
  1212. print "a and b are equal."
  1213. else:
  1214. print "a and b are different !"
  1215.  
  1216.  
  1217. Why is 0.9+0.8 different than 1.7 ?
  1218. Because the computer can only handle bits, and you cannot precisely represent all numbers in binary.
  1219.  
  1220. The computer is good a storing values such as 0.5 (which is 0.1 in binary), or 0.125 (which is 0.001 in binary).
  1221. But it's not capable of storing the exact value 0.3 (because there is no exact representation of 0.3 in binary).
  1222.  
  1223. Thus, as soon as you do a=1.7, a does not contain 1.7, but only a binary approximation of the decimal number 1.7.
  1224.  
  1225.  
  1226.  
  1227. Get user's home directory path
  1228.  
  1229. It's handy to store or retreive configuration files for your programs.
  1230.  
  1231. import os.path
  1232. print os.path.expanduser('~')
  1233.  
  1234. Note that this also works under Windows. Nice !
  1235. (It points to the "Document and settings" user's folder, or even the network folder if the user has one.)
  1236.  
  1237.  
  1238.  
  1239. Python's virtual machine
  1240.  
  1241. Python - like Java or Microsoft .Net - has a virtual machine.
  1242. Python has a specific bytecode. It's an machine language like Intel 80386 or Pentium machine language, but there is no physical microprocessor capable of executing it.
  1243. The bytecode runs in a program which simulates a microprocessor: a virtual machine.
  1244. This is the same for Java and .Net. Java's virtual machine is named JVM (Java Virtual Machine), and .Net's virtual machine is the CLR (Common Language Runtime)
  1245.  
  1246.  
  1247. Let's have an example: mymodule.py
  1248.  
  1249. def myfunction(a):
  1250. print "I have ",a
  1251. b = a * 3
  1252. if b<50:
  1253. b = b + 77
  1254. return b
  1255.  
  1256. This no-nonsense program takes a number, displays it, multiplies it by 3, adds 77 if the result is less than 50 and returns it. (Granted, this is weird.)
  1257.  
  1258. Let's try it:
  1259.  
  1260. C:\>python
  1261. Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on win32
  1262. Type "help", "copyright", "credits" or "license" for more information.
  1263. >>> import mymodule
  1264. >>> print mymodule.myfunction(5)
  1265. I have 5
  1266. 92
  1267. >>>
  1268.  
  1269. Ok, easy.
  1270.  
  1271. See the mymodule.pyc file which appeared ? This is the "compiled" version of our module, the Python bytecode. This file contains instructions for the Python virtual machine.
  1272. The .pyc files are automatically generated by Python whenever a module is imported.
  1273. Python can directly run the .pyc files if you want. You could even run the .pyc without the .py.
  1274.  
  1275. If you delete the .pyc file, it will be recreated from the .py.
  1276. If you update the .py source, Python will detect this change and automatically update the corresponding .pyc.
  1277.  
  1278. Want to have a peek in the .pyc bytecode to see what it looks like ?
  1279. It's easy:
  1280.  
  1281. >>> import dis
  1282. >>> dis.dis(mymodule.myfunction)
  1283. 2 0 LOAD_CONST 1 ('I have')
  1284. 3 PRINT_ITEM
  1285. 4 LOAD_FAST 0 (a)
  1286. 7 PRINT_ITEM
  1287. 8 PRINT_NEWLINE
  1288.  
  1289. 3 9 LOAD_FAST 0 (a)
  1290. 12 LOAD_CONST 2 (3)
  1291. 15 BINARY_MULTIPLY
  1292. 16 STORE_FAST 1 (b)
  1293.  
  1294. 4 19 LOAD_FAST 1 (b)
  1295. 22 LOAD_CONST 3 (50)
  1296. 25 COMPARE_OP 0 (<)
  1297. 28 JUMP_IF_FALSE 14 (to 45)
  1298. 31 POP_TOP
  1299.  
  1300. 5 32 LOAD_FAST 1 (b)
  1301. 35 LOAD_CONST 4 (77)
  1302. 38 BINARY_ADD
  1303. 39 STORE_FAST 1 (b)
  1304. 42 JUMP_FORWARD 1 (to 46)
  1305. >> 45 POP_TOP
  1306.  
  1307. 6 >> 46 LOAD_FAST 1 (b)
  1308. 49 RETURN_VALUE
  1309. >>>
  1310.  
  1311. You can see the virtual machine instructions (LOAD_CONST, PRINT_ITEM, COMPARE_OP...) and their operands (0 which is the reference of the variable a, 1 which is the reference of variable b...)
  1312.  
  1313. For example, line 3 of the source code is: b = a * 3
  1314. In Python bytecode, this translates to:
  1315.  
  1316. 3 9 LOAD_FAST 0 (a) # Load variable a on the stack.
  1317. 12 LOAD_CONST 2 (3) # Load the value 3 on the stack
  1318. 15 BINARY_MULTIPLY # Multiply them
  1319. 16 STORE_FAST 1 (b) # Store result in variable b
  1320.  
  1321.  
  1322. Python also tries to optimise the code.
  1323. For example, the string "I have" will not be reused after line 2. So Python decides to reuse the adresse of the string (1) for variable b.
  1324.  
  1325. The list of instructions supported by the Python virtual machine is at http://www.python.org/doc/current/lib/bytecodes.html
  1326.  
  1327.  
  1328.  
  1329. SQLite - databases made simple
  1330.  
  1331. SQLite is a tremendous database engine. I mean it.
  1332.  
  1333. It has some drawbacks:
  1334.  
  1335. * Not designed for concurrent access (database-wide lock on writing).
  1336. * Only works locally (no network service, although you can use things like sqlrelay).
  1337. * Does not handle foreign keys.
  1338. * No rights management (grant/revoke).
  1339.  
  1340. Advantages:
  1341.  
  1342. * very fast (faster than mySQL on most operations).
  1343. * Respects almost the whole SQL-92 standard.
  1344. * Does not require installation of a service.
  1345. * No database administration to perform.
  1346. * Does not eat computer memory and CPU when not in use.
  1347. * SQLite databases are compact
  1348. * 1 database = 1 file (easy to move/deploy/backup/transfer/email).
  1349. * SQLite databases are portable across platforms (Windows, MacOS, Linux, PDA...)
  1350. * SQLite is ACID (data consistency is assured even on computer failure or crash)
  1351. * Supports transactions
  1352. * Fields can store Nulls, integers, reals (floats), text or blob (binary data).
  1353. * Can handle up to 2 Tera-bytes of data (although going over 12 Gb is not recommended).
  1354. * Can work as a in-memory database (blazing performances !)
  1355.  
  1356. SQLite is very fast, very compact, easy to use. It's god gift for local data processing (websites, data crunching, etc.).
  1357. Oh... and it's not only free, it's also public domain (no GPL license headaches).
  1358. I love it.
  1359.  
  1360. SQLite engine can be accessed from a wide variety of languages. (Thus SQLite databases are also a great way to exchange complex data sets between programs written in different languages, even with mixed numerical/text/binary data. No use to invent a special file format or a complex XML schema with base64-encoded data.)
  1361.  
  1362. SQLite is embeded in Python 2.5.
  1363. For Python 2.4 and ealier, it must be installed separately: http://initd.org/tracker/pysqlite
  1364.  
  1365. Here's the basics:
  1366.  
  1367. #!/usr/bin/python
  1368. # -*- coding: iso-8859-1 -*-
  1369. from sqlite3 import dbapi2 as sqlite
  1370.  
  1371. # Create a database:
  1372. con = sqlite.connect('mydatabase.db3')
  1373. cur = con.cursor()
  1374.  
  1375. # Create a table:
  1376. cur.execute('create table clients (id INT PRIMARY KEY, name CHAR(60))')
  1377.  
  1378. # Insert a single line:
  1379. client = (5,"John Smith")
  1380. cur.execute("insert into clients (id, name) values (?, ?)", client )
  1381. con.commit()
  1382.  
  1383. # Insert several lines at once:
  1384. clients = [ (7,"Ella Fitzgerald"),
  1385. (8,"Louis Armstrong"),
  1386. (9,"Miles Davis")
  1387. ]
  1388. cur.executemany("insert into clients (id, name) values (?, ?)", clients )
  1389. con.commit()
  1390.  
  1391. cur.close()
  1392. con.close()
  1393.  
  1394. Now let's use the database:
  1395. #!/usr/bin/python
  1396. # -*- coding: iso-8859-1 -*-
  1397. from sqlite3 import dbapi2 as sqlite
  1398.  
  1399. # Connect to an existing database
  1400. con = sqlite.connect('mydatabase.db3')
  1401. cur = con.cursor()
  1402.  
  1403. # Get row by row
  1404. print "Row by row:"
  1405. cur.execute('select id, name from clients order by name;')
  1406. row = cur.fetchone()
  1407. while row:
  1408. print row
  1409. row = cur.fetchone()
  1410.  
  1411. # Get all rows at once:
  1412. print "All rows at once:"
  1413. cur.execute('select id, name from clients order by name;')
  1414. print cur.fetchall()
  1415.  
  1416. cur.close()
  1417. con.close()
  1418.  
  1419. This outputs:
  1420.  
  1421. Row by row:
  1422. (7, u'Ella Fitzgerald')
  1423. (5, u'John Smith')
  1424. (8, u'Louis Armstrong')
  1425. (9, u'Miles Davis')
  1426. All rows at once:
  1427. [(7, u'Ella Fitzgerald'), (5, u'John Smith'), (8, u'Louis Armstrong'), (9, u'Miles Davis')]
  1428.  
  1429.  
  1430. Note that creating a database and connecting to an existing one is the same instruction (sqlite.connect()).
  1431.  
  1432. To manage your SQLite database, there is a nice freeware under Windows: SQLiteSpy (http://www.zeitungsjunge.de/delphi/sqlitespy/)
  1433.  
  1434. Hint 1: If you use sqlite.connect(':memory:'), this creates an in-memory database. As there is no disk access, this is a very very fast database.
  1435. (But make sure you have enough memory to handle your data.)
  1436.  
  1437.  
  1438. Hint 2: To make your program compatible with Python 2.5 and Python 2.4+pySqlLite, do the following:
  1439. try:
  1440. from sqlite3 import dbapi2 as sqlite # For Python 2.5
  1441. except ImportError:
  1442. pass
  1443.  
  1444. if not sqlite:
  1445. try:
  1446. from pysqlite2 import dbapi2 as sqlite # For Python 2.4 and pySqlLite
  1447. except ImportError:
  1448. pass
  1449.  
  1450. if not sqlite: # If module not imported successfully, raise an error.
  1451. raise ImportError, "This module requires either: Python 2.5 or Python 2.4 with the pySqlLite module (http://initd.org/tracker/pysqlite)"
  1452.  
  1453. # Then use it
  1454. con = sqlite.connect("mydatabase.db3")
  1455. ...
  1456. This way, sqlite wil be properly imported whenever it's running under Python 2.5 or Python 2.4.
  1457.  
  1458.  
  1459. Links:
  1460.  
  1461. * pySQLite homepage: http://initd.org/tracker/pysqlite
  1462. * SQLite homepage (usefull information on the database engine itself): http://www.sqlite.org/
  1463.  
  1464.  
  1465.  
  1466. Dive into Python
  1467.  
  1468. You're programming in Python ?
  1469. Then you should be reading Dive into Pyhon.
  1470.  
  1471.  
  1472. The book is free.
  1473.  
  1474.  
  1475. Go read it.
  1476.  
  1477.  
  1478. No really.
  1479.  
  1480.  
  1481. Read it.
  1482.  
  1483.  
  1484. I can't imagine decent Python programing without reading this book.
  1485.  
  1486.  
  1487. At least download it...
  1488.  
  1489.  
  1490. ...now !
  1491.  
  1492.  
  1493. This is a must-read.
  1494. This book is available for free in different formats (HTML, PDF, Word 97...).
  1495. Plenty of information, good practices, ideas, gotchas and snippets about classes, datatypes, introspection, exceptions, HTML/XML processing, unit testing, webservices, refactoring, whatever.
  1496.  
  1497. You'll thank yourself one day for having read this book. Trust me.
  1498.  
  1499.  
  1500.  
  1501. Creating a mutex under Windows
  1502.  
  1503. I use a mutex in webGobbler so that the InnoSetup uninstaller knows webGobbler is still running (and that it shouldn't be uninstalled while the program is still running).
  1504. That's a handy feature of InnoSetup.
  1505.  
  1506. CTYPES_AVAILABLE = True
  1507. try:
  1508. import ctypes
  1509. except ImportError:
  1510. CTYPES_AVAILABLE = False
  1511.  
  1512. WEBGOBBLER_MUTEX = None
  1513. if CTYPES_AVAILABLE and sys.platform=="win32":
  1514. try:
  1515. WEBGOBBLER_MUTEX=ctypes.windll.kernel32.CreateMutexA(None,False,"sebsauvage_net_webGobbler_running")
  1516. except:
  1517. pass
  1518.  
  1519. I perform an except:pass, because if the mutex can't be created, it's not a big deal for my program (It's only an uninstaller issue).
  1520. Your mileage may vary.
  1521.  
  1522. This mutex will be automatically destroyed when the Python program exits.
  1523.  
  1524.  
  1525.  
  1526. urllib2 and proxies
  1527.  
  1528. With urllib2, you can use proxies.
  1529.  
  1530. # The proxy address and port:
  1531. proxy_info = { 'host' : 'proxy.myisp.com',
  1532. 'port' : 3128
  1533. }
  1534.  
  1535. # We create a handler for the proxy
  1536. proxy_support = urllib2.ProxyHandler({"http" : "http://%(host)s:%(port)d" % proxy_info})
  1537.  
  1538. # We create an opener which uses this handler:
  1539. opener = urllib2.build_opener(proxy_support)
  1540.  
  1541. # Then we install this opener as the default opener for urllib2:
  1542. urllib2.install_opener(opener)
  1543.  
  1544. # Now we can send our HTTP request:
  1545. htmlpage = urllib2.urlopen("http://sebsauvage.net/").read(200000)
  1546.  
  1547. What is nice about this trick is that this will set the proxy parameters for your whole program.
  1548.  
  1549.  
  1550. If your proxy requires authentication, you can do it too !
  1551.  
  1552. proxy_info = { 'host' : 'proxy.myisp.com',
  1553. 'port' : 3128,
  1554. 'user' : 'John Doe',
  1555. 'pass' : 'mysecret007'
  1556. }
  1557. proxy_support = urllib2.ProxyHandler({"http" : "http://%(user)s:%(pass)s@%(host)s:%(port)d" % proxy_info})
  1558. opener = urllib2.build_opener(proxy_support)
  1559. urllib2.install_opener(opener)
  1560. htmlpage = urllib2.urlopen("http://sebsauvage.net/").read(200000)
  1561. (Code in this snippet was heavily inspired from http://groups.google.com/groups?selm=mailman.983901970.11969.python-list%40python.org )
  1562.  
  1563.  
  1564. Note that as of version 2.4.2 of Python, urllib2 only supports the following proxy authentication methods: Basic and Digest.
  1565. If your proxy uses NTLM (Windows/IE-specific), you're out of luck.
  1566.  
  1567.  
  1568. Beside this trick, there is a simplier way to set the proxy:
  1569.  
  1570. import os
  1571. os.environ['HTTP_PROXY'] = 'http://proxy.myisp.com:3128'
  1572.  
  1573. You can also do the same with os.environ['FTP_PROXY'].
  1574.  
  1575.  
  1576.  
  1577. A proper User-agent in your HTTP requests
  1578.  
  1579. If you have a Python program which sends HTTP requests, the netiquette says it should properly identify itself.
  1580.  
  1581. By default, Python uses a User-Agent such as: Python-urllib/1.16
  1582. You should change this.
  1583.  
  1584. Here's how to do it with urllib2:
  1585.  
  1586. request_headers = { 'User-Agent': 'PeekABoo/1.3.7' }
  1587. request = urllib2.Request('http://sebsauvage.net', None, request_headers)
  1588. urlfile = urllib2.urlopen(request)
  1589.  
  1590.  
  1591. As a rule of thumb:
  1592.  
  1593. * Make sure the program name you use in User-Agent is really unique (Search on Google !).
  1594. * Adopt the form: applicationName/version, such as webGobbler/1.2.4.
  1595. * If your program spiders websites, you should respect robot rules.
  1596. * Always use bound reads. (eg. .read(200000), not .read() alone).
  1597. * Choose the network timeout wisely. You can use the following code to set the timeout in your whole program:
  1598. socket.setdefaulttimeout(60) # A 60 seconds timeout.
  1599.  
  1600.  
  1601.  
  1602. Error handling with urllib2
  1603.  
  1604. You are using urllib/urllib2 and want to check for 404 and other HTTP errors ?
  1605. Here's the trick:
  1606.  
  1607. try:
  1608. urlfile = urllib2.urlopen('http://sebsauvage.net/nonexistingpage.html')
  1609. except urllib2.HTTPError, exc:
  1610. if exc.code == 404:
  1611. print "Not found !"
  1612. else:
  1613. print "HTTP request failed with error %d (%s)" % (exc.code, exc.msg)
  1614. except urllib2.URLError, exc:
  1615. print "Failed because:", exc.reason
  1616.  
  1617. This way, you can check for 404 and other HTTP error codes.
  1618. Note that urllib2 will not raise an exception on 2xx and 3xx codes. The exception urllib2.HTTPError will be raised with 4xx and 5xx codes (which is the expected behaviour).
  1619. (Note also that HTTP 30x redirections will be automatically and transparently handled by urllib2.)
  1620.  
  1621.  
  1622.  
  1623. urllib2: What am I getting ?
  1624.  
  1625. When you send a HTTP request, this may return html, images, videos, whatever.
  1626. In some cases you should check that the type of data you're receiving is what you expected.
  1627.  
  1628. To check the type of document you're receiving, look at the MIME type (Content-type) header:
  1629.  
  1630. urlfile = urllib2.urlopen('http://www.commentcamarche.net/')
  1631. print "Document type is", urlfile.info().getheader("Content-Type","")
  1632.  
  1633. This will output:
  1634.  
  1635. Document type is text/html
  1636.  
  1637. Warning: You may find other info after a semi-colon, such as:
  1638.  
  1639. Document type is text/html; charset=iso-8859-1
  1640.  
  1641. So what you should always do is:
  1642.  
  1643. print "Document type is", urlfile.info().getheader("Content-Type","").split(';')[0].strip()
  1644.  
  1645. to get only the "text/html" part.
  1646.  
  1647.  
  1648.  
  1649. Note that .info() will also give you other HTTP response headers:
  1650.  
  1651. print "HTTP Response headers:"
  1652. print urlfile.info()
  1653.  
  1654. This would print things like:
  1655.  
  1656. Document type is Date: Thu, 23 Mar 2006 15:13:29 GMT
  1657. Content-Type: text/html; charset=iso-8859-1
  1658. Server: Apache
  1659. X-Powered-By: PHP/5.1.2-1.dotdeb.2
  1660. Connection: close
  1661.  
  1662.  
  1663.  
  1664. Reading (and writing) large XLS (Excel) files
  1665.  
  1666. In one of my projects, I had to read large XLS files.
  1667. Of course you can access all cells content through COM calls, but it's painfully slow.
  1668.  
  1669. There's a simple trick: Simply ask Excel to open the XLS file and save it in CSV, then use Python's CSV module to read the file !
  1670. This is the fastest way to read large XLS data files.
  1671.  
  1672. import os
  1673. import win32com.client
  1674.  
  1675. filename = 'myfile.xls'
  1676. filepath = os.path.abspath(filename) # Always make sure you use an absolute path !
  1677.  
  1678. # Start Excel and open the XLS file:
  1679. excel = win32com.client.Dispatch('Excel.Application')
  1680. excel.Visible = True
  1681. workbook = excel.Workbooks.Open(filepath)
  1682.  
  1683. # Save as CSV:
  1684. xlCSVWindows =0x17 # from enum XlFileFormat
  1685. workbook.SaveAs(Filename=filepath+".csv",FileFormat=xlCSVWindows)
  1686.  
  1687. # Close workbook and Excel
  1688. workbook.Close(SaveChanges=False)
  1689. excel.Quit()
  1690.  
  1691.  
  1692. Hint: You can use this trick the other way round (generate a CSV in Python, open with Excel) to import a large quantity of data into Excel. This is much faster than filling data cell by cell through COM calls.
  1693.  
  1694. Hint: When using excel.Workbooks.Open(), always make sure you use an asbolute path with os.path.abspath().
  1695.  
  1696. Hint: You can also ask excel to save as HTML, then parse the HTML with htmllib, sgmllib or BeautifulSoup. You will be able to get more information, including formatting, colors, cells span, document author or even formulas !
  1697.  
  1698. Hint: For Excel VBA documentation, search *.chm in C:\Program Files\Microsoft Office\
  1699. Example: For Excel 2000, it's C:\Program Files\Microsoft Office\Office\1036\VBAXL9.CHM
  1700.  
  1701. Hint: If you want to find the corresponding VBA code for an action without hunting through the VBA Help file, just record a macro of the action and open it !
  1702. This will automatically generate the VBA code (which can be easily translated into Python).
  1703. I created an example video of this trick (in French, sorry): http://sebsauvage.net/temp/wink/excel_vbarecord.html
  1704.  
  1705. Hint: Sometimes, you'll need Excel constants. To get the list of constants:
  1706.  
  1707. 1. Run makepy.py (eg. C:\Python24\Lib\site-packages\win32com\client\makepy.py)
  1708. 2. In the list, choose "Microsoft Excel 9.0 Object Library (1.3)" (or similar) and click ok.
  1709. 3. Have a look in C:\Python24\Lib\site-packages\win32com\gen_py\ directory.
  1710. You will find the wrapper (such as 00020813-0000-0000-C000-000000000046x0x1x3.py)
  1711. 4. Open this file: it contains Excel constants and their values (You can copy/paste them in your code.)
  1712. For example:
  1713. xlCSVMSDOS =0x18 # from enum XlFileFormat
  1714. xlCSVWindows =0x17 # from enum XlFileFormat
  1715.  
  1716. Hint: If you want to import data into Excel, you can also generate an HTML document in Python and ask Excel to open it. You'll be able to set cell font colors, spanning, etc.
  1717. Sub-hint 1: Use a lot of \n in your generated HTML code (one after each </td>, preferably). Excel does not like loooooong lines.
  1718. Sub-hint 2: You can also use CSS styles to set formatting/colors in several cells. Simply include a <style> stylesheet in the generated HTML.
  1719. Sub-hint 3: Using CSS, you can even force the cell format (text, numeric, etc.). eg. <style><!--.mystyle{mso-number-format:"\@";}--></style> then use <td class=mystyle>25</td> to force the cell to text (usefull, for example, to prevent Excel from trying to compute international phone number - you stupid app !)
  1720. Or mso-number-format:"0\.000"; to force a numeric format with 3 digits precision.
  1721.  
  1722.  
  1723.  
  1724. Saving the stack trace
  1725.  
  1726. Sometimes when you create an application, it's handy to have the stack trace dumped in a log file for debugging purposes.
  1727.  
  1728. Here's how to do it:
  1729.  
  1730. import traceback
  1731.  
  1732. def fifths(a):
  1733. return 5/a
  1734.  
  1735. def myfunction(value):
  1736. b = fifths(value) * 100
  1737.  
  1738. try:
  1739. print myfunction(0)
  1740. except Exception, ex:
  1741. logfile = open('mylog.log','a')
  1742. traceback.print_exc(file=logfile)
  1743. logfile.close()
  1744. print "Oops ! Something went wrong. Please look in the log file."
  1745.  
  1746.  
  1747. After running this program, mylog.log contains:
  1748.  
  1749. Traceback (most recent call last):
  1750. File "a.py", line 10, in ?
  1751. print myfunction(0)
  1752. File "a.py", line 7, in myfunction
  1753. b = fifths(value) * 100
  1754. File "a.py", line 4, in fifths
  1755. return 5/a
  1756. ZeroDivisionError: integer division or modulo by zero
  1757.  
  1758.  
  1759. Hint: You can also simply use traceback.print_exc(file=sys.stdout) to print the stacktrace on screen.
  1760.  
  1761. Hint: Mixing this trick with this one can save your day. Detailed error messages = bugs more easily spotted.
  1762.  
  1763.  
  1764.  
  1765. Filtering out warnings
  1766.  
  1767. Sometimes, Python displays warning.
  1768. While they are usefull and should be taken care of, you sometimes want to disable them.
  1769.  
  1770. Here's how to filter them:
  1771.  
  1772. import warnings
  1773. warnings.filterwarnings(action = 'ignore',message='.*?no locals\(\) in functions bound by Psyco')
  1774. (I use to filter this specific Psyco warning.)
  1775.  
  1776. message is a regular expression.
  1777.  
  1778. Make sure you do not filter too much, so that important information is not thrown away.
  1779.  
  1780.  
  1781.  
  1782. Saving an image as progressive JPEG with PIL
  1783.  
  1784. PIL (Python Imaging Library) is very good graphics library for image manipulation (This is the library I used in webGobbler).
  1785.  
  1786. Here's how to save an Image object in progressive JPEG.
  1787. This may seem obvious, but hey...
  1788.  
  1789. myimage.save('myimage.jpg',option={'progression':True,'quality':60,'optimize':True})
  1790.  
  1791. (Assuming that myimage is an Image PIL object.)
  1792.  
  1793.  
  1794.  
  1795. Charsets and encoding
  1796. ( There is a french translation of this article: http://sebsauvage.net/python/charsets_et_encoding.html )
  1797.  
  1798. If you think text = ASCII = 8 bits = 1 byte per character, you're wrong.
  1799. That's short-sighted.
  1800.  
  1801. There is something every developer should know about, otherwise this will bite you one day if you don't know better:
  1802.  
  1803. Charsets and encoding
  1804.  
  1805.  
  1806. Ok. Let me put this:
  1807.  
  1808. You know the computer is a big stupid machine. It knows nothing about alphabets or even decimal numbers. A computer is a bit cruncher.
  1809. So when we have symbols such as the letter 'a' or the question mark '?', we have to create binary representation of these symbols for the computer.
  1810. That's the only way to store them in the computer's memory.
  1811.  
  1812.  
  1813.  
  1814. The character set
  1815.  
  1816. First, we have to choose which number to use for each symbol. That's a simple table.
  1817.  
  1818. Symbol → number
  1819.  
  1820. The usual suspect is ASCII.
  1821. In ASCII, the letter 'a' is the number 97. The question mark '?' is the number 67.
  1822.  
  1823. But ASCII is far from a universal standard.
  1824.  
  1825. There are plenty of other character sets, such as EBCDIC, KOI8-R for Russian characters, ISO-8852-1 for latin characters (accent characters, for example), Big5 for traditional chinese, Shift_JIS for Japanese, etc. Every country, culture, language has developed its own character set. This is a big mess, really.
  1826.  
  1827. An international effort tries to standardise all this: UNICODE.
  1828. Unicode is a huge table which tells which number to use for each symbol.
  1829. Some examples:
  1830.  
  1831. Unicode table 0000 to 007F Unicode table 0080 to 00FF Unicode table 0900 to 097F Unicode table 1100 to 117F
  1832. Unicode table
  1833. 0000 to 007F (0 to 127)
  1834. (Latin characters) Unicode table
  1835. 0080 to 00FF (128 to 255)
  1836. (Latin characters,
  1837. including accented characters) Unicode table
  1838. 0900 to 097F (2304 to 2431)
  1839. (devanagari) Unicode table
  1840. 1100 to 117F (4352 to 4479)
  1841. (hangul jamo)
  1842.  
  1843.  
  1844. So the word "bébé" (baby in French) will translate to these numbers: 98 233 98 233 (or 0062 00E9 0062 00E9 in 16 bits hexadecimal).
  1845.  
  1846.  
  1847.  
  1848. The encoding
  1849. Now we have all those numbers, we have to find a binary representation for them.
  1850.  
  1851. Number → Bits
  1852.  
  1853. ASCII uses the simple mapping: 1 ASCII code (0...127) = 1 byte (8 or 7 bits). It's ok for ASCII, because ASCII uses only numbers from 0 to 127. It fits in a byte.
  1854.  
  1855. But for Unicode and other charsets, that's a problem: 8 bits are not enough. These charsets require other encodings.
  1856. Most of them use a multi-byte encoding (a character is represented by several bytes).
  1857.  
  1858.  
  1859. For Unicode, there are several encodings. The first one is the raw 16 bits Unicode. 16 bits (2 bytes) per character.
  1860. But as most texts only use the lower part of the Unicode table (codes 0 to 127), that's huge waste of space.
  1861.  
  1862. That's why UTF-8 was invented.
  1863.  
  1864. That's brilliant: For codes 0 to 127, simply use 1 byte per character. Just like ASCII.
  1865. If you need special, less common characters (128 to 2047), use two bytes.
  1866. If you need more specific characters (2048 to 65535), use three bytes.
  1867. etc.
  1868. Unicode value
  1869. (in hexadecimal) Bits to output
  1870. 00000000 to 0000007F 0xxxxxxx
  1871. 00000080 to 000007FF 110xxxxx 10xxxxxx
  1872. 00000800 to 0000FFFF 1110xxxx 10xxxxxx 10xxxxxx
  1873. 00010000 to 001FFFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
  1874. 00200000 to 03FFFFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
  1875. 04000000 to 7FFFFFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
  1876. (table from http://www.cl.cam.ac.uk/~mgk25/unicode.html)
  1877.  
  1878. Thus for most latin texts, this will be as space-savvy as ASCII, but you have the ability to use any special Unicode character if you want.
  1879.  
  1880. How's that ?
  1881.  
  1882.  
  1883.  
  1884.  
  1885.  
  1886. Let's sum up all this
  1887.  
  1888. Symbol → Number → Bits
  1889.  
  1890. charset
  1891. encoding
  1892.  
  1893. The charset will tell you which number to use for each symbol,
  1894. the encoding will tell you how to encode these numbers into bits.
  1895.  
  1896.  
  1897. One simple example is:
  1898.  
  1899. é → 233 → C3 A9
  1900.  
  1901.  
  1902. in Unicode
  1903. in UTF-8
  1904.  
  1905. For example the word "bébé" (baby in French):
  1906.  
  1907. bébé → 98 233 98 233 → 62 C3 A9 62 C3 A9
  1908.  
  1909.  
  1910. in Unicode
  1911. in UTF-8
  1912.  
  1913.  
  1914. If I receive the bits 62 C3 A9 62 C3 A9 without the knowledge of the encoding and the charset, this will be useless to me.
  1915.  
  1916. Clueless programers will display these bits as is: bébé
  1917. then will ask "Why am I getting those strange characters ?".
  1918.  
  1919. You're not clueless, because you've just read this article.
  1920.  
  1921.  
  1922. Transmitting a text alone is useless.
  1923. If you transmit a text, you must always also tell which charset/encoding was used.
  1924.  
  1925.  
  1926. That's also why many webpages are broken: They do not tell their charset/encoding.
  1927. Do you know that in this case all browsers try to guess the charset ?
  1928. That's bad.
  1929. Every webpage should have its encoding specified in HTTP headers or in the HTML header itself, such as:
  1930. <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
  1931.  
  1932. This is the same for emails: Any good email client will indicate which charset/encoding the text is encoded in.
  1933.  
  1934. Hint: Some encodings are specific to some charsets. For example, UTF-8 is only used for Unicode. So if I receive UTF-8 encoded data, I know its charset is Unicode.
  1935.  
  1936.  
  1937. Python and Unicode
  1938. Python supports directly Unicode and UTF-8.
  1939. Use them as much as possible.
  1940. Your programs will smoothly support international characters.
  1941.  
  1942.  
  1943. First, you should always indicate which charset/encoding your Python source uses, such as:
  1944.  
  1945. #!/usr/bin/python
  1946. # -*- coding: iso-8859-1 -*-
  1947.  
  1948. Next, use Unicode strings in your programs (use the 'u' prefix):
  1949.  
  1950. badString = "Bad string !"
  1951. bestString = u"Good unicode string."
  1952. anotherGoodString = u"Ma vie, mon \u0153uvre."
  1953.  
  1954. ( \u0153 is the unicode character "œ". (0153 is the code for "œ"). The "œ" character is in the latin-1 section of the charts: http://www.unicode.org/charts/ )
  1955.  
  1956. To convert a standard string to Unicode, do:
  1957.  
  1958. myUnicodeString = unicode(mystring)
  1959. or
  1960. myUnicodeString = mystring.decode('iso-8859-1')
  1961.  
  1962. To convert a Unicode string to a specific charset:
  1963.  
  1964. myString = myUnicodeString.encode('iso-8859-1')
  1965.  
  1966. The list of charsets/encodings supported by Python are available at http://docs.python.org/lib/standard-encodings.html
  1967.  
  1968.  
  1969.  
  1970. Don't forget than when you print, you use the charset of the console (stdout). So sometimes printing a Unicode string can fail, because the string may contain Unicode characters which are not available in the charset of your operating system console.
  1971.  
  1972. Let me put it again: A simple print instruction can fail.
  1973.  
  1974.  
  1975. Example, with the french word "œuvre":
  1976.  
  1977. >>> a = u'\u0153uvre'
  1978. >>> print a
  1979. Traceback (most recent call last):
  1980. File "<stdin>", line 1, in ?
  1981. File "c:\python24\lib\encodings\cp437.py", line 18, in encode
  1982. return codecs.charmap_encode(input,errors,encoding_map)
  1983. UnicodeEncodeError: 'charmap' codec can't encode character u'\u0153' in position 0: character maps to <undefined>
  1984.  
  1985. Python is telling you that the Unicode character 153 (œ) has no equivalent in the charset your operating system console uses.
  1986.  
  1987. To see which charset your console supports, you can do:
  1988.  
  1989. >>> import sys
  1990. >>> print sys.stdout.encoding
  1991. cp437
  1992.  
  1993. So to make sure you print without error, you can do:
  1994.  
  1995. >>> import sys
  1996. >>> a = u'\u0153uvre'
  1997. >>> print a.encode(sys.stdout.encoding,'replace')
  1998. ?uvre
  1999. >>>
  2000.  
  2001. Unicode characters which cannot be displayed by the console will be converted to '?'.
  2002.  
  2003.  
  2004.  
  2005. Special note: When dealing with external sources (files, databases, stdint/stdout/stderr, API such as Windows COM or registry, etc.) be carefull: Some of these will not communicate in Unicode, but in some special charset. You should properly convert to and from Unicode accordingly.
  2006.  
  2007. For example, to write Unicode strings to an UTF-8 encoded file, you can do:
  2008.  
  2009. >>> a = u'\u0153uvre'
  2010. >>> file = open('myfile.txt','w')
  2011. >>> file.write( a.encode('utf-8') )
  2012. >>> file.close()
  2013.  
  2014. Reading the same file:
  2015.  
  2016. >>> file = open('myfile.txt','r')
  2017. >>> print file.read()
  2018. œuvre
  2019. >>>
  2020.  
  2021. Oops... you see there's a problem here. We opened the file but we didn't specify the encoding when reading. That's why we get this "œ" garbage (which is UTF-8 codes).
  2022. Let's decode the UTF-8:
  2023.  
  2024. >>> file=open('myfile.txt','r')
  2025. >>> print repr( file.read().decode('utf-8') )
  2026. u'\u0153uvre'
  2027. >>>
  2028.  
  2029. There, we got it right. That's our "œuvre" word.
  2030. Remember our console does not support the \u0153 character ? (That's why we used repr().)
  2031.  
  2032. So let's encode the string in a charset supported by our console:
  2033.  
  2034. >>> import sys
  2035. >>> file=open('myfile.txt','r')
  2036. >>> print file.read().decode('utf-8').encode(sys.stdout.encoding,'replace')
  2037. ?uvre
  2038. >>>
  2039.  
  2040. Yes, this looks cumbersome.
  2041. But don't forget we are translating between 3 modes: UTF-8 (the input file), Unicode (the Python object) and cp437 (the output console charset).
  2042.  
  2043.  
  2044. UTF-8 → Unicode → cp437
  2045. The input file. .decode('utf-8')
  2046. The Python unicode string. .encode('cp437') The console.
  2047.  
  2048.  
  2049. That's why we have to explicitely convert between encodings.
  2050. Explicit is better than implicit.
  2051.  
  2052.  
  2053.  
  2054. Links:
  2055.  
  2056. * http://www.joelonsoftware.com/articles/Unicode.html
  2057. * http://www.tbray.org/ongoing/When/200x/2003/04/06/Unicode
  2058. * http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF
  2059. * http://www.cl.cam.ac.uk/~mgk25/unicode.html
  2060. * http://www.unicode.org/
  2061.  
  2062.  
  2063.  
  2064. Iterating
  2065. A shorter syntax
  2066. When you come from other languages, you are tempted to use these other languages' constructs.
  2067. For example, when iterating over the elements of a table, you would probably iterate using an index:
  2068.  
  2069. countries = ['France','Germany','Belgium','Spain']
  2070. for i in range(0,len(countries)):
  2071. print countries[i]
  2072. or
  2073. countries = ['France','Germany','Belgium','Spain']
  2074. i = 0
  2075. while i<len(countries):
  2076. print countries[i]
  2077. i = i+1
  2078.  
  2079. It's better to use iterators:
  2080.  
  2081. countries = ['France','Germany','Belgium','Spain']
  2082. for country in countries:
  2083. print country
  2084.  
  2085. It does the same thing, but:
  2086.  
  2087. * You've spared a variable (i).
  2088. * The code is more compact.
  2089. * It's more readable.
  2090.  
  2091.  
  2092. "for country in countries" is almost plain English.
  2093.  
  2094. The same is true for other things, like reading lines from a text file. So instead for doing:
  2095.  
  2096. file = open('file.txt','r')
  2097. for line in file.readlines():
  2098. print line
  2099. file.close()
  2100.  
  2101. Simply do:
  2102.  
  2103. file = open('file.txt','r')
  2104. for line in file:
  2105. print line
  2106. file.close()
  2107.  
  2108. These kind of constructs can help to keep code shorter and more readable.
  2109.  
  2110.  
  2111. Iterating with multiple items
  2112. It's also easy to iterate over multiple items at once.
  2113.  
  2114. data = [ ('France',523,'Jean Dupont'),
  2115. ('Germany',114,'Wolf Spietzer'),
  2116. ('Belgium',227,'Serge Ressant')
  2117. ]
  2118.  
  2119. for (country,nbclients,manager) in data:
  2120. print manager,'manages',nbclients,'clients in',country
  2121.  
  2122.  
  2123. This also applies to dictionnaries (hashtables). For example, you could iterate over a dictionnary like this:
  2124.  
  2125. data = { 'France':523, 'Germany':114, 'Belgium':227 }
  2126. for country in data: # This is the same as for country in data.keys()
  2127. print 'We have',data[country],'clients in',country
  2128.  
  2129. But it's better to do it this way:
  2130.  
  2131. data = { 'France':523, 'Germany':114, 'Belgium':227 }
  2132. for (country,nbclients) in data.items():
  2133. print 'We have',nbclients,'clients in',country
  2134.  
  2135. because you spare a hash for each entry.
  2136.  
  2137.  
  2138. Creating iterators
  2139. It's easy to create your own iterators.
  2140.  
  2141. For example, let's say we have a clients file:
  2142.  
  2143. COUNTRY NBCLIENTS
  2144. France 523
  2145. Germany 114
  2146. Spain 127
  2147. Belgium 227
  2148.  
  2149. and we want a class capable of reading this file format. It must return the country and the number of clients.
  2150. We create a clientFileReader class:
  2151.  
  2152. class clientFileReader:
  2153.  
  2154. def __init__(self,filename):
  2155. self.file=open(filename,'r')
  2156. self.file.readline() # We discard the first line.
  2157.  
  2158. def close(self):
  2159. self.file.close()
  2160.  
  2161. def __iter__(self):
  2162. return self
  2163.  
  2164. def next(self):
  2165. line = self.file.readline()
  2166. if not line:
  2167. raise StopIteration()
  2168. return ( line[:13], int(line[13:]) )
  2169.  
  2170. To create an iterator:
  2171.  
  2172. * Create a __iter__() method which returns the iterator (which happen to be ourselves !)
  2173. * The iterator must have a next() method which returns the next item.
  2174. * The next() method must raise the StopIteration() exception when no more data is available.
  2175.  
  2176. It's as simple as this !
  2177.  
  2178. Then we can simply use our file reader as:
  2179.  
  2180. clientFile = clientFileReader('file.txt')
  2181.  
  2182. for (country,nbclients) in clientFile:
  2183. print 'We have',nbclients,'clients in',country
  2184.  
  2185. clientFile.close()
  2186.  
  2187. See ?
  2188.  
  2189. "for (country,nbclients) in clientFile:" is a higher level construct which makes the code much more readable and hides the complexity of the underlying file format.
  2190. This is much better than chopping file lines in the main loop.
  2191.  
  2192.  
  2193.  
  2194. Parsing the command-line
  2195.  
  2196. It's not recommended to try to parse the command-line (sys.argv) yourself. Parsing the command-line is not as trivial as it seems to be.
  2197. Python has two good modules dedicated to command-line parsing: getopt ant optparse.
  2198. They do their job very well (They take care of mundane tasks such as parameters quoting, for example).
  2199.  
  2200. optparse is the new, more Pythonic and OO module. Yet I often prefer getopt. We'll see both.
  2201.  
  2202.  
  2203. Ok, let's create a program which is supposed to reverses all lines in a text file.
  2204. Our program has:
  2205.  
  2206. * a mandatory argument: file, the file to process.
  2207. * an optional parameters with value: -o to specify an output file (such as -o myoutputfile.txt)
  2208. * an optional parameter without value: -c to capitalize all letters.
  2209. * an optional parameters: -h to display program help.
  2210.  
  2211.  
  2212. getopt
  2213. Let's do it with getopt first:
  2214.  
  2215. import sys
  2216. import getopt
  2217.  
  2218. if __name__ == "__main__":
  2219.  
  2220. opts, args = None, None
  2221. try:
  2222. opts, args = getopt.getopt(sys.argv[1:], "hco:",["help", "capitalize","output="])
  2223. except getopt.GetoptError, e:
  2224. raise 'Unknown argument "%s" in command-line.' % e.opt
  2225.  
  2226. for option, value in opts:
  2227. if option in ('-h','--help'):
  2228. print 'You asked for the program help.'
  2229. sys.exit(0)
  2230. if option in ('-c','--capitalize'):
  2231. print "You used the --capitalize option !"
  2232. elif option in ('-o','--output'):
  2233. print "You used the --output option with value",value
  2234.  
  2235. # Make sure we have our mandatory argument (file)
  2236. if len(args) != 1:
  2237. print 'You must specify one file to process. Use -h for help.'
  2238. sys.exit(1)
  2239.  
  2240. print "The file to process is",args[0]
  2241.  
  2242. # The rest of the code goes here...
  2243.  
  2244. Details:
  2245.  
  2246. * The getopt.getopt() will parse the command-line:
  2247. *
  2248. o sys.argv[1:] skips the program name itself (which is sys.argv[0])
  2249. o "hco:" give the list of possible options (-h, -c and -o). The colon (:) tells that -o requires a value.
  2250. o ["help", "capitalize","output="] allows the user to use the long options version (--help/--capitalize/--output).
  2251. User can even be mix short and long options in the command-line, such as: reverse --capitalise -o output.txt myfile.txt
  2252. * The for loop will check all options.
  2253. *
  2254. o It's typically in this loop that you will modify your program options according to command-line options.
  2255. o The --help will display the help page and exit (sys.exit(0)).
  2256. * The if len(args)!=1 is used to make sure our mandatory argument (file) is provided. You can choose to allow (or not) several arguments.
  2257.  
  2258.  
  2259. Let's use out program from the command line:
  2260.  
  2261. C:\>python reverse.py -c -o output.txt myfile.txt
  2262. You used the --capitalize option !
  2263. You used the --output option with value output.txt
  2264. The file to process is myfile.txt
  2265.  
  2266. You can also call for help:
  2267.  
  2268. C:\>python reverse.py -h
  2269. You asked for the program help.
  2270. (Of course, you would have to display real usefull program information here.)
  2271.  
  2272.  
  2273. optparse
  2274. Let's do the same with optparse:
  2275.  
  2276. import sys
  2277. import optparse
  2278.  
  2279. if __name__ == "__main__":
  2280.  
  2281. parser = optparse.OptionParser()
  2282. parser.add_option("-c","--capitalize",action="store_true",dest="capitalize")
  2283. parser.add_option("-o","--output",action="store",type="string",dest="outputFilename")
  2284.  
  2285. (options, args) = parser.parse_args()
  2286.  
  2287. if options.capitalize:
  2288. print "You used the --capitalize option !"
  2289.  
  2290. if options.outputFilename:
  2291. print "You used the --output option with value",options.outputFilename
  2292.  
  2293. # Make sure we have our mandatory argument (file)
  2294. if len(args) != 1:
  2295. print 'You must specify one file to process. Use -h for help.'
  2296. sys.exit(1)
  2297.  
  2298. print "The file to process is",args[0]
  2299.  
  2300. # The rest of the code goes here...
  2301.  
  2302. Not much different, but:
  2303.  
  2304. * You first create a parser (optparse.OptionParser()), add options to this parser (parser.add_option(...)) then ask him to parse the command-line (parser.parse_args()).
  2305. *
  2306. o Option -c does not take a value. We merely record the presence of -c with action="store_true".
  2307. dest="capitalize" will store this option in the attribute capitalize of our parser.
  2308. o For -o, we specify a string to store in the outputFilename attribute of our parser.
  2309. * We later simply access our options through options.capitalize and options.outputFilename. No loop.
  2310. * args still gives us our file argument.
  2311.  
  2312.  
  2313. Let's try it:
  2314.  
  2315. C:\>python reverse2.py -c -o output.txt myfile.txt
  2316. You used the --capitalize option !
  2317. You used the --output option with value output.txt
  2318. The file to process is myfile.txt
  2319.  
  2320. It works. Let's ask for help:
  2321.  
  2322. C:\>python reverse2.py -h
  2323. usage: reverse2.py [options]
  2324.  
  2325. options:
  2326. -h, --help show this help message and exit
  2327. -c, --capitalize
  2328. -o OUTPUTFILENAME, --output=OUTPUTFILENAME
  2329.  
  2330. But did you notice ?
  2331. We didn't code the --help option !
  2332. Yet it works !
  2333.  
  2334. It's because optparse generates help for you.
  2335. You can even add help information in options with the help parameter, such as:
  2336.  
  2337. parser.add_option("-c","--capitalize",action="store_true",dest="capitalize",help="Capitalize all letters")
  2338. parser.add_option("-o","--output",action="store",type="string",dest="outputFilename",help="Write output to a file")
  2339.  
  2340. Which will give:
  2341.  
  2342. C:\>python reverse2.py -h
  2343. usage: reverse2.py [options]
  2344.  
  2345. options:
  2346. -h, --help show this help message and exit
  2347. -c, --capitalize Capitalize all letters
  2348. -o OUTPUTFILENAME, --output=OUTPUTFILENAME
  2349. Write output to a file
  2350.  
  2351. Help is automatically generated.
  2352.  
  2353. You see that optparse is quite flexible. You can even extend it with custom actions, customize help pages, etc.
  2354.  
  2355.  
  2356. Using AutoIt from Python
  2357.  
  2358. AutoIt is a fabulous free scripting language for scripting Windows: you can click buttons, send keystrokes, wait for Windows, etc.
  2359. Although you could do the same in Python using raw Win32 API, it's a pain. It's much easier to use AutoIt COM interface.
  2360.  
  2361. Example: Launch Notepad and send some text.
  2362.  
  2363. import win32com.client
  2364.  
  2365. autoit = win32com.client.Dispatch("AutoItX3.Control")
  2366. autoit.Run("notepad.exe")
  2367. autoit.AutoItSetOption("WinTitleMatchMode", 4)
  2368. autoit.WinWait("classname=Notepad")
  2369. autoit.send("Hello, world.")
  2370. (Note that I matched the window by its class ("classname=Notepad") and not by its title, because the title is not the same in the different versions of Windows (english, french, german, etc.))
  2371.  
  2372. Of course, this is just COM calls. Nothing special. But AutoIt is handy.
  2373. The AutoIt COM documentation is C:\Program Files\AutoIt3\AutoItX\AutoItX.chm
  2374.  
  2375. The COM control is C:\Program Files\AutoIt3\AutoItX\AutoItX3.dll
  2376. Don't forget that this COM control must be registered prior usage (with the command-line: regsvr32 AutoItX3.dll).
  2377.  
  2378. I use the following code to automatically register the COM control if it's not available:
  2379.  
  2380. import os
  2381.  
  2382. # Import the Win32 COM client
  2383. try:
  2384. import win32com.client
  2385. except ImportError:
  2386. raise ImportError, 'This program requires the pywin32 extensions for Python. See http://starship.python.net/crew/mhammond/win32/'
  2387.  
  2388. import pywintypes # to handle COM errors.
  2389.  
  2390. # Import AutoIT (first try)
  2391. autoit = None
  2392. try:
  2393. autoit = win32com.client.Dispatch("AutoItX3.Control")
  2394. except pywintypes.com_error:
  2395. # If can't instanciate, try to register COM control again:
  2396. os.system("regsvr32 /s AutoItX3.dll")
  2397.  
  2398. # Import AutoIT (second try if necessary)
  2399. if not autoit:
  2400. try:
  2401. autoit = win32com.client.Dispatch("AutoItX3.Control")
  2402. except pywintypes.com_error:
  2403. raise ImportError, "Could not instanciate AutoIT COM module because",e
  2404.  
  2405. if not autoit:
  2406. print "Could not instanciate AutoIT COM module."
  2407. sys.exit(1)
  2408.  
  2409. # Now we have AutoIT, let's start Notepad and write some text:
  2410. autoit.Run("notepad.exe")
  2411. autoit.AutoItSetOption("WinTitleMatchMode", 4)
  2412. autoit.WinWait("classname=Notepad")
  2413. autoit.send("Hello, world.")
  2414.  
  2415.  
  2416. What's in a main
  2417.  
  2418. If you've spent some time with Python, you must have encountered this strange Python idiom:
  2419.  
  2420. if __name__ == "__main__":
  2421.  
  2422. What's that ?
  2423.  
  2424.  
  2425. A Python program can be used in (at least) two ways:
  2426.  
  2427. * executed directly: python mymodule.py
  2428. * imported: import mymodule
  2429.  
  2430.  
  2431. What is under the if __name__=="__main__" will only be run if the module is run directly.
  2432. If you import the module, the code will not be run.
  2433.  
  2434. This has many uses. For example:
  2435.  
  2436. * Parse the command-line in the main and call the methods/functions, so that the module can be used from the command line.
  2437. * Run the unit tests (unittest) in the main, so that the module performs a self-test when run.
  2438. * Run example code in the main (for example, for a tkinter widget).
  2439.  
  2440.  
  2441. Example: Parsing the command-line
  2442. Let's write a module which extracts all links from a HTML page, and add a main to this module:
  2443.  
  2444. import re
  2445.  
  2446. class linkextractor:
  2447. def __init__(self,htmlPage):
  2448. self.htmlcode = htmlPage
  2449. def getLinks(self):
  2450. linksList = re.findall('<a href=(.*?)>.*?</a>',self.htmlcode)
  2451. links = []
  2452. for link in linksList:
  2453. if link.startswith('"'): link=link[1:] # Remove quotes
  2454. if link.endswith('"'): link=link[:-1]
  2455. links.append(link)
  2456. return links
  2457.  
  2458. if __name__ == "__main__":
  2459. import sys,getopt
  2460. opts, args = getopt.getopt(sys.argv[1:],"")
  2461. if len(args) != 1:
  2462. print "You must specify a file to process."
  2463. sys.exit(1)
  2464. print "Linkextractor is processing %s..." % args[0]
  2465. file = open(args[0],"rb")
  2466. htmlpage = file.read(500000)
  2467. file.close()
  2468. le = linkextractor(htmlpage)
  2469. print le.getLinks()
  2470.  
  2471. * The class linkextractor contains our program logic.
  2472. * The main only parses the command-line, reads the specified file and uses our linkextractor class to process it.
  2473.  
  2474.  
  2475. We can use our class by running it from the command line:
  2476.  
  2477. C:\>python linkextractor.py myPage.html
  2478. Linkextractor is processing myPage.html...
  2479. [...]
  2480.  
  2481. or from another Python program by importing it:
  2482.  
  2483. import linkextractor, urllib
  2484.  
  2485. htmlSource = urllib.urlopen("http://sebsauvage.net/index.html").read(200000)
  2486. le = linkextractor.linkextractor(htmlSource)
  2487. print le.getLinks()
  2488.  
  2489. In this case, the main will not run.
  2490.  
  2491.  
  2492. Being able to use our class directly from the command-line is very handy.
  2493.  
  2494.  
  2495.  
  2496. Example: Running self-tests
  2497. You can also write a self-test for this unit:
  2498.  
  2499. import re, unittest
  2500.  
  2501. class linkextractor:
  2502. def __init__(self,htmlPage):
  2503. self.htmlcode = htmlPage
  2504. def getLinks(self):
  2505. linksList = re.findall('<a href=(.*?)>.*?</a>',self.htmlcode)
  2506. links = []
  2507. for link in linksList:
  2508. if link.startswith('"'): link=link[1:] # Remove quotes
  2509. if link.endswith('"'): link=link[:-1]
  2510. links.append(link)
  2511. return links
  2512.  
  2513. class _TestExtraction(unittest.TestCase):
  2514. def testLinksWithQuotes(self):
  2515. htmlcode = """<html><body>
  2516. Welcome to <a href="http://sebsauvage.net/">sebsauvage.net/</a><br>
  2517. How about some <a href="http://python.org">Python</a> ?</body></html>"""
  2518. le = linkextractor(htmlcode)
  2519. links = le.getLinks()
  2520. self.assertEqual(links[0], 'http://sebsauvage.net/',
  2521. 'First link is %s. It should be http://sebsauvage.net/ without quotes.' % links[0])
  2522. self.assertEqual(links[1], 'http://python.org',
  2523. 'Second link is %s. It should be http://python.org without quotes.' % links[1])
  2524.  
  2525. if __name__ == "__main__":
  2526. print "Performing self-tests..."
  2527. unittest.main()
  2528.  
  2529. You can simply self-test our module by running it:
  2530.  
  2531. C:\>python linkextractor.py
  2532. Performing self-tests...
  2533. .
  2534. ----------------------------------------------------------------------
  2535. Ran 1 test in 0.000s
  2536.  
  2537. OK
  2538.  
  2539. C:\>
  2540.  
  2541. This is very usefull to auto-test (or at least sanity-check) all your programs/modules/classes/libraries automatically.
  2542.  
  2543. (Note that our unittest above is quite lame: It should do a lot more things. To learn more about the unittest, I highly recommend to read Dive into Python.)
  2544.  
  2545.  
  2546. Mixing both
  2547. You can even mix self-tests and command-line parsing in the main:
  2548.  
  2549. * If nothing provided in command-line (or a special --selftest option is provided), perform the self-test.
  2550. * Otherwise perform what the user asked in command line.
  2551.  
  2552.  
  2553.  
  2554. Disable all javascript in a html page
  2555.  
  2556. If you have a Python program which grabs html pages from the web, javascript is a real nuisance when you browse these pages offline.
  2557. Here's a simple trick to disable all javascript:
  2558.  
  2559. Short version:
  2560. html = html.replace('<script','<noscript')
  2561.  
  2562. Better version:
  2563. import re
  2564. re_noscript = re.compile('<(/?)script',re.IGNORECASE)
  2565. html = re_noscript.sub(r'<\1noscript',html)
  2566.  
  2567. This will disable all javascript (browsers will simply ignore the <noscript> tag), and you will still be able to have a look in the code if you want.
  2568.  
  2569.  
  2570.  
  2571. Multiplying
  2572.  
  2573. Python can multiply. It can even multiply strings, tuples or lists.
  2574.  
  2575. >>> 3*'a'
  2576. 'aaa'
  2577.  
  2578. >>> 3*'hello'
  2579. 'hellohellohello'
  2580.  
  2581. >>> 3*('hello')
  2582. 'hellohellohello'
  2583.  
  2584. >>> 3*('hello',)
  2585. ('hello', 'hello', 'hello')
  2586.  
  2587. >>> 3*['hello']
  2588. ['hello', 'hello', 'hello']
  2589.  
  2590. >>> 3*('hello','world')
  2591. ('hello', 'world', 'hello', 'world', 'hello', 'world')
  2592.  
  2593. Notice the difference between ('hello') which is a single string and ('hello',) which is a tuple.
  2594. That's why they do not multiply the same.
  2595.  
  2596. You can also add:
  2597.  
  2598. >>> print 3*'a' + 2*'b'
  2599. aaabb
  2600.  
  2601. >>> print 3*('a',) + 2*('b',)
  2602. ('a', 'a', 'a', 'b', 'b')
  2603.  
  2604. >>> print 3*['a'] + 2*['b']
  2605. ['a', 'a', 'a', 'b', 'b']
  2606.  
  2607.  
  2608.  
  2609. Creating and reading .tar.bz2 archives
  2610.  
  2611. tar.bz2 archives are usually smaller than .zip or .tar.gz.
  2612. Python can natively create and read those archives.
  2613.  
  2614. Compressing a directory into a .tar.bz2 archive:
  2615.  
  2616. import tarfile
  2617. import bz2
  2618. archive = tarfile.open('myarchive.tar.bz2','w:bz2')
  2619. archive.debug = 1 # Display the files beeing compressed.
  2620. archive.add(r'd:\myfiles') # d:\myfiles contains the files to compress
  2621. archive.close()
  2622.  
  2623.  
  2624. Decompressing a .tar.bz2 archive:
  2625.  
  2626. import tarfile
  2627. import bz2
  2628. archive = tarfile.open('myarchive.tar.bz2','r:bz2')
  2629. archive.debug = 1 # Display the files beeing decompressed.
  2630. for tarinfo in archive:
  2631. archive.extract(tarinfo, r'd:\mydirectory') # d:\mydirectory is where I want to uncompress the files.
  2632. archive.close()
  2633.  
  2634.  
  2635.  
  2636. Enumerating
  2637.  
  2638. A simple function to get a numbered enumeration: enumerate() works on sequences (string, list...) and returns a tuple (index,item):
  2639.  
  2640. >>> for i in enumerate( ['abc','def','ghi','jkl'] ):
  2641. ... print i
  2642. ...
  2643. (0, 'abc')
  2644. (1, 'def')
  2645. (2, 'ghi')
  2646. (3, 'jkl')
  2647. >>>
  2648. >>> for i in enumerate('hello world'):
  2649. ... print i
  2650. ...
  2651. (0, 'h')
  2652. (1, 'e')
  2653. (2, 'l')
  2654. (3, 'l')
  2655. (4, 'o')
  2656. (5, ' ')
  2657. (6, 'w')
  2658. (7, 'o')
  2659. (8, 'r')
  2660. (9, 'l')
  2661. (10, 'd')
  2662. >>>
  2663.  
  2664.  
  2665.  
  2666. Zip that thing
  2667.  
  2668. zip, map and filter are powerful sequence operators which can replace list comprehension in some cases.
  2669.  
  2670.  
  2671. List comprehension
  2672. List comprehension is a syntax to create a list of transformed elements of a sequence.
  2673. For example:
  2674.  
  2675. >>> mylist = (1,3,5,7,9)
  2676. >>> print [value*2 for value in mylist]
  2677. [2, 6, 10, 14, 18]
  2678.  
  2679. This reads almost as plain english: compute value*2 for each value in my list.
  2680.  
  2681. You can also use conditions to filter the list:
  2682.  
  2683. >>> mylist = (1,3,5,7,9)
  2684. >>> print [i*2 for i in mylist if i>4]
  2685. [10, 14, 18]
  2686.  
  2687.  
  2688. There are other way to compute and transform lists: zip, map and filter.
  2689.  
  2690. zip
  2691. zip returns a list of tuples. Each tuple contains the i-th element of each sequence (lists, tuples, etc.). Example:
  2692.  
  2693. >>> print zip( ['a','b','c'], [1,2,3] )
  2694. [('a', 1), ('b', 2), ('c', 3)]
  2695.  
  2696. You can even zip multiple sequences together:
  2697.  
  2698. >>> print zip( ['a','b','c'], [1,2,3], ['U','V','W'] )
  2699. [('a', 1, 'U'), ('b', 2, 'V'), ('c', 3, 'W')]
  2700.  
  2701. Strings are sequences too. You can zip them:
  2702.  
  2703. >>> print zip('abcd','1234')
  2704. [('a', '1'), ('b', '2'), ('c', '3'), ('d', '4')]
  2705.  
  2706. The output list will be as long as the shortest input sequence:
  2707.  
  2708. >>> print zip( [1,2,3,4,5], ['a','b'] )
  2709. [(1, 'a'), (2, 'b')]
  2710.  
  2711.  
  2712.  
  2713. map
  2714. map applies a function to each element of a sequence, and returns a list.
  2715.  
  2716. Example: Apply the abs() function to each element of a list:
  2717.  
  2718. >>> print map(abs, [-5,7,-12] )
  2719. [5, 7, 12]
  2720.  
  2721. which is the equivalent of:
  2722.  
  2723. >>> print [abs(i) for i in [-5,7,-12]]
  2724. [5, 7, 12]
  2725.  
  2726. Except that map is faster.
  2727.  
  2728. Note that you can use your own functions:
  2729.  
  2730. >>> def myfunction(value):
  2731. ... return value*10+1
  2732. ...
  2733. >>> print map(myfunction, [1,2,3,4] )
  2734. [11, 21, 31, 41]
  2735. >>>
  2736.  
  2737. You can also use a function which takes several argument. In this case, you must provide as many lists as arguments.
  2738.  
  2739. Example: We use the max() function which returns the maximum value of two values. We provide to provide 2 sequences.
  2740.  
  2741. >>> print map(max, [4,5,6], [1,2,9] )
  2742. [4, 5, 9]
  2743.  
  2744. This is the equivalent of:
  2745.  
  2746. >>> [ max(4,1), max(5,2), max(6,9) ]
  2747. [4, 5, 9]
  2748.  
  2749.  
  2750.  
  2751. filter
  2752. filter does the same as map, except that the element is discarded if the function returns None (or an equivalent of None).
  2753. (I say 'equivalent' because in Python things like zero or an empty list are the equivalent of None).
  2754.  
  2755. >>> print filter(abs, [-5,7,0,-12] )
  2756. [-5, 7, -12]
  2757.  
  2758. This is the equivalent of:
  2759.  
  2760. >>> print [i for i in [-5,7,0,-12] if abs(i)]
  2761. [-5, 7, -12]
  2762.  
  2763. Except that filter is faster.
  2764.  
  2765.  
  2766.  
  2767. So... map/filter or list comprehension ?
  2768.  
  2769. It's usually better to use map/filter, because they're faster. But not always.
  2770.  
  2771. Take the following example:
  2772.  
  2773. >>> print [abs(i+5) for i in [-5,7,0,-12] if i<5]
  2774. [0, 5, 7]
  2775.  
  2776. You could express the same thing with filter, maps and lambda:
  2777.  
  2778. >>> map( lambda x:abs(x+5), filter(lambda x:x<5 ,[-5,7,0,-12]) )
  2779. [0, 5, 7]
  2780.  
  2781. The list comprehension is not only easier to read: It's also surprisingly faster.
  2782.  
  2783. Always profile your code to see which method is faster.
  2784.  
  2785.  
  2786.  
  2787.  
  2788. There are other sequences operators:
  2789.  
  2790.  
  2791. reduce
  2792. Reduce is handy to perform cumulative computations (eg. compute 1+2+3+4+5 or 1*2*3*4*5).
  2793.  
  2794. >>> def myfunction(a,b):
  2795. ... return a*b
  2796. ...
  2797. >>> mylist = [1,2,3,4,5]
  2798. >>> print reduce(myfunction, mylist)
  2799. 120
  2800.  
  2801. which is the equivalent of:
  2802.  
  2803. >>>print ((((1*2)*3)*4)*5)
  2804. 120
  2805.  
  2806. In fact, you can import the operator from the operator module:
  2807.  
  2808. >>> import operator
  2809. >>> mylist = [1,2,3,4,5]
  2810. >>> print reduce(operator.mul, mylist)
  2811. 120
  2812. >>> print reduce(operator.add, mylist)
  2813. 15
  2814. (Reduce hint is taken from http://jaynes.colorado.edu/PythonIdioms.html#operator )
  2815.  
  2816.  
  2817.  
  2818. Conversions
  2819. You can convert between lists, tuples, dictionnaries and strings. Some examples:
  2820.  
  2821. >>> mytuple = (1,2,3)
  2822. >>> print list(mytuple) # Tuple to list
  2823. [1, 2, 3]
  2824. >>>
  2825. >>> mylist = [1,2,3] # List to tuple
  2826. >>> print tuple(mylist)
  2827. (1, 2, 3)
  2828. >>>
  2829. >>> mylist2 = [ ('blue',5), ('red',3), ('yellow',7) ]
  2830. >>> print dict(mylist2) # List to dictionnary
  2831. {'blue': 5, 'yellow': 7, 'red': 3}
  2832. >>>
  2833. >>> mystring = 'hello'
  2834. >>> print list(mystring) # String to list
  2835. ['h', 'e', 'l', 'l', 'o']
  2836. >>>
  2837. >>> mylist3 = ['w','or','ld']
  2838. >>> print ''.join(mylist3) # List to string
  2839. world
  2840. >>>
  2841.  
  2842. You get the picture.
  2843.  
  2844. This is just an example, because all of them are sequences: For example, you do not need to convert a string to a list for iterating over each character !
  2845.  
  2846. >>> mystring = 'hello'
  2847. >>> for character in list(mystring): # This is BAD. Don't do this.
  2848. ... print character
  2849. ...
  2850. h
  2851. e
  2852. l
  2853. l
  2854. o
  2855. >>> for character in mystring: # Simply do that !
  2856. ... print character
  2857. ...
  2858. h
  2859. e
  2860. l
  2861. l
  2862. o
  2863. >>>
  2864.  
  2865. Keep in mind sequence functions require any sequence, not only lists.
  2866. Thus it's ok to do:
  2867.  
  2868. >>> print [i+'*' for i in 'Hello']
  2869. ['H*', 'e*', 'l*', 'l*', 'o*']
  2870.  
  2871. or even:
  2872.  
  2873. >>> print max('Hello, world !')
  2874. w
  2875. (The max() function also accepts sequences.)
  2876.  
  2877. because strings are already a sequences. You do not have to convert the string into a list.
  2878.  
  2879.  
  2880.  
  2881. A Tkinter widgets which expands in grid
  2882.  
  2883. When you lay out widgets in a tkinter application, you use either the pack() or the grid() geometry manager.
  2884. Grid is - in my opinion - a far more powerful and flexible geometry manager than Pack.
  2885. (By the way, never ever mix .pack() and .grid(), or you'll have nasty surprises.)
  2886.  
  2887. The (expand=1,fill=BOTH) option of pack() manager is nice to have the widgets automatically expand when the window is resized, but you can do the same with the Grid manager.
  2888.  
  2889. Instructions:
  2890.  
  2891. * When using grid(), specify sticky (usually 'NSEW')
  2892. * Then use grid_columnconfigure() and grid_rowconfigure() to set the weights (usually 1).
  2893.  
  2894.  
  2895. Example: A simple Window with a red and a blue canvas. The two canvas automatically resize to use all the available space in the window.
  2896.  
  2897. import Tkinter
  2898.  
  2899. class myApplication:
  2900. def __init__(self,root):
  2901. self.root = root
  2902. self.initialisation()
  2903.  
  2904. def initialisation(self):
  2905. canvas1 = Tkinter.Canvas(self.root)
  2906. canvas1.config(background="red")
  2907. canvas1.grid(row=0,column=0,sticky='NSEW')
  2908.  
  2909. canvas2 = Tkinter.Canvas(self.root)
  2910. canvas2.config(background="blue")
  2911. canvas2.grid(row=1,column=0,sticky='NSEW')
  2912.  
  2913. self.root.grid_columnconfigure(0,weight=1)
  2914. self.root.grid_rowconfigure(0,weight=1)
  2915. self.root.grid_rowconfigure(1,weight=1)
  2916.  
  2917. def main():
  2918. root = Tkinter.Tk()
  2919. root.title('My application')
  2920. app = myApplication(root)
  2921. root.mainloop()
  2922.  
  2923. if __name__ == "__main__":
  2924. main()
  2925.  
  2926. If you comment the lines containing grid_columnconfigure and grid_rowconfigure, you will see that the canvas do not expand.
  2927.  
  2928.  
  2929. You can even play with the weights to share the available space between widgets, eg:
  2930.  
  2931. self.root.grid_rowconfigure(0,weight=1)
  2932. self.root.grid_rowconfigure(1,weight=2)
  2933.  
  2934.  
  2935.  
  2936. Convert a string date to a datetime object
  2937.  
  2938. Let's say we want to convert a string date (eg."2006-05-18 19:35:00") into a datetime object.
  2939.  
  2940. >>> import datetime,time
  2941. >>> stringDate = "2006-05-18 19:35:00"
  2942. >>> dt = datetime.datetime.fromtimestamp(time.mktime(time.strptime(stringDate,"%Y-%m-%d %H:%M:%S")))
  2943. >>> print dt
  2944. 2006-05-18 19:35:00
  2945. >>> print type(dt)
  2946. <type 'datetime.datetime'>
  2947. >>>
  2948.  
  2949. * time.strptime() converts the string to a struct_time tuple.
  2950. * time.mktime() converts this tuple into seconds (elasped since epoch, C-style).
  2951. * datetime.fromtimestamp() converts the seconds to a Python datetime object.
  2952.  
  2953. Yes, this is convoluted.
  2954.  
  2955.  
  2956.  
  2957. Compute the difference between two dates, in seconds
  2958.  
  2959. >>> import datetime,time
  2960. >>> def dateDiffInSeconds(date1, date2):
  2961. ... timedelta = date2 - date1
  2962. ... return timedelta.days*24*3600 + timedelta.seconds
  2963. ...
  2964. >>> date1 = datetime.datetime(2006,02,17,15,30,00)
  2965. >>> date2 = datetime.datetime(2006,05,18,11,01,00)
  2966. >>> print dateDiffInSeconds(date1,date2)
  2967. 7759860
  2968. >>>
  2969.  
  2970.  
  2971.  
  2972. Managed attributes, read-only attributes
  2973.  
  2974. Sometimes, you want to have a greater control over attributes access in your object.
  2975. You can do this:
  2976.  
  2977. * Create a private attribute (self.__x)
  2978. * Create accessor functions to this attribute (getx,setx,delx)
  2979. * Create a property() and assign it these accessors.
  2980.  
  2981. Example:
  2982.  
  2983. class myclass(object):
  2984. def __init__(self):
  2985. self.__x = None
  2986.  
  2987. def getx(self): return self.__x
  2988. def setx(self, value): self.__x = value
  2989. def delx(self): del self.__x
  2990. x = property(getx, setx, delx, "I'm the 'x' property.")
  2991.  
  2992. a = myclass()
  2993. a.x = 5 # Set
  2994. print a.x # Get
  2995. del a.x # Del
  2996.  
  2997. This way, you can control access in the getx/setx/delx methods.
  2998.  
  2999.  
  3000. For example, you can prevent a property from being written or deleted:
  3001.  
  3002. class myclass(object):
  3003. def __init__(self):
  3004. self.__x = None
  3005.  
  3006. def getx(self): return self.__x
  3007. def setx(self, value): raise AttributeError,'Property x is read-only.'
  3008. def delx(self): raise AttributeError,'Property x cannot be deleted.'
  3009. x = property(getx, setx, delx, "I'm the 'x' property.")
  3010.  
  3011. a = myclass()
  3012. a.x = 5 # This line will fail
  3013. print a.x
  3014. del a.x
  3015.  
  3016. If you run this program, you will get:
  3017.  
  3018. Traceback (most recent call last):
  3019. File "example.py", line 11, in ?
  3020. a.x = 5 # This line will fail
  3021. File "example.py", line 6, in setx
  3022. def setx(self, value): raise AttributeError,'Property x is read-only.'
  3023. AttributeError: Property x is read-only.
  3024.  
  3025.  
  3026. First day of the month
  3027.  
  3028. >>> import datetime
  3029. >>> def firstDayOfMonth(dt):
  3030. ... return (dt+datetime.timedelta(days=-dt.day+1)).replace(hour=0,minute=0,second=0,microsecond=0)
  3031. ...
  3032. >>> print firstDayOfMonth( datetime.datetime(2006,05,13) )
  3033. 2006-05-01 00:00:00
  3034. >>>
  3035.  
  3036. This function takes a datetime object as input (dt) and returns the first day of the month at midnight (12:00:00 AM).
  3037.  
  3038.  
  3039.  
  3040. Fetch, read and parse a RSS 2.0 feed in 6 lines
  3041.  
  3042. Dumbed-down version. Easy.
  3043. This program gets the RSS 2.0 feed from sebsauvage.net, parses it and displays all titles.
  3044.  
  3045. import urllib, sys, xml.dom.minidom
  3046. address = 'http://www.sebsauvage.net/rss/updates.xml'
  3047. document = xml.dom.minidom.parse(urllib.urlopen(address))
  3048. for item in document.getElementsByTagName('item'):
  3049. title = item.getElementsByTagName('title')[0].firstChild.data
  3050. print "Title:", title.encode('latin-1','replace')
  3051.  
  3052.  
  3053.  
  3054. Get a login from BugMeNot
  3055.  
  3056. BugMeNot.com provides logins/passwords for sites which have a compulsory registration.
  3057. Here's a simple function which returns a login/password for a given domain or URL.
  3058.  
  3059. import re,urllib2,urlparse
  3060.  
  3061. def getLoginPassword(url):
  3062. ''' Returns a login/password for a given domain using BugMeNot.
  3063.  
  3064. Input: url (string) -- the URL or domain to get a login for.
  3065.  
  3066. Output: a tuple (login,password)
  3067. Will return (None,None) if no login is available.
  3068.  
  3069. Examples:
  3070. print getLoginPassword("http://www.nytimes.com/auth/login")
  3071. ('goaway147', 'goaway')
  3072.  
  3073. print getLoginPassword("imdb.com")
  3074. ('bobshit@mailinator.com', 'diedie')
  3075. '''
  3076. if not url.lower().startswith('http://'): url = "http://"+url
  3077. domain = urlparse.urlsplit(url)[1].split(':')[0]
  3078. address = 'http://www.bugmenot.com/view/%s?utm_source=extension&utm_medium=firefox' % domain
  3079. request = urllib2.Request(address, None, {'User-Agent':'Mozilla/5.0'})
  3080. page = urllib2.urlopen(request).read(50000)
  3081. re_loginpwd = re.compile('<th>Username.*?<td>(.+?)</td>.*?<th>Password.*?<td>(.+?)</td>',re.IGNORECASE|re.DOTALL)
  3082. match = re_loginpwd.search(page)
  3083. if match:
  3084. return match.groups()
  3085. else:
  3086. return (None,None)
  3087.  
  3088. Example:
  3089.  
  3090. >>> print getLoginPassword("http://www.nytimes.com/auth/login")
  3091. ('goaway147', 'goaway')
  3092. >>> print getLoginPassword("imdb.com")
  3093. ('bobshit@mailinator.com', 'diedie')
  3094.  
  3095. Note: It looks like BugMeNot sometimes serves an error page, or tells you that no login are available although they are. You are warned.
  3096.  
  3097.  
  3098.  
  3099. Logging into a site and handling session cookies
  3100.  
  3101. Here's an example of logging into a website and using the session cookie for further requests (We log into imdb.com).
  3102.  
  3103. import cookielib, urllib, urllib2
  3104.  
  3105. login = 'ismellbacon123@yahoo.com'
  3106. password = 'login'
  3107.  
  3108. # Enable cookie support for urllib2
  3109. cookiejar = cookielib.CookieJar()
  3110. urlOpener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookiejar))
  3111.  
  3112. # Send login/password to the site and get the session cookie
  3113. values = {'login':login, 'password':password }
  3114. data = urllib.urlencode(values)
  3115. request = urllib2.Request("http://www.imdb.com/register/login", data)
  3116. url = urlOpener.open(request) # Our cookiejar automatically receives the cookies
  3117. page = url.read(500000)
  3118.  
  3119. # Make sure we are logged in by checking the presence of the cookie "id".
  3120. # (which is the cookie containing the session identifier.)
  3121. if not 'id' in [cookie.name for cookie in cookiejar]:
  3122. raise ValueError, "Login failed with login=%s, password=%s" % (login,password)
  3123.  
  3124. print "We are logged in !"
  3125.  
  3126. # Make another request with our session cookie
  3127. # (Our urlOpener automatically uses cookies from our cookiejar)
  3128. url = urlOpener.open('http://imdb.com/find?s=all&q=grave')
  3129. page = url.read(200000)
  3130.  
  3131. This requires Python 2.4 or later (because of the cookielib module).
  3132. Note that you can have cookie support for older versions of Python with third-party modules (ClientCookie for example).
  3133.  
  3134. Login form parameters, URL and session cookie name vary from site to site. Use Firefox to see them all:
  3135.  
  3136. * For forms: Menu "Tools" > "Page info" > "Forms" tab.
  3137. * For cookies: Menu "Tools" > "Options" > "Privacy" tab > "Cookies" tab > "View cookies" button.
  3138.  
  3139.  
  3140. Most of the time, you do not need to logout.
  3141.  
  3142.  
  3143. Searching on Google
  3144.  
  3145. This class searchs Google and returns a list of links (URL). It does not use the Google API.
  3146. It automatically browses the different result pages, and gathers only the URLs.
  3147.  
  3148. import re,urllib,urllib2
  3149.  
  3150. class GoogleHarvester:
  3151. re_links = re.compile(r'<a class=l href="(.+?)"',re.IGNORECASE|re.DOTALL)
  3152. def __init__(self):
  3153. pass
  3154. def harvest(self,terms):
  3155. '''Searchs Google for these terms. Returns only the links (URL).
  3156.  
  3157. Input: terms (string) -- one or several words to search.
  3158.  
  3159. Output: A list of urls (strings).
  3160. Duplicates links are removed, links are sorted.
  3161.  
  3162. Example: print GoogleHarvester().harvest('monthy pythons')
  3163. '''
  3164. print "Google: Searching for '%s'" % terms
  3165. links = {}
  3166. currentPage = 0
  3167. while True:
  3168. print "Google: Querying page %d (%d links found so far)" % (currentPage/100+1, len(links))
  3169. address = "http://www.google.com/search?q=%s&num=100&hl=en&start=%d" % (urllib.quote_plus(terms),currentPage)
  3170. request = urllib2.Request(address, None, {'User-Agent':'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'} )
  3171. urlfile = urllib2.urlopen(request)
  3172. page = urlfile.read(200000)
  3173. urlfile.close()
  3174. for url in GoogleHarvester.re_links.findall(page):
  3175. links[url] = 0
  3176. if "</div>Next</a></table></div><center>" in page: # Is there a "Next" link for next page of results ?
  3177. currentPage += 100 # Yes, go to next page of results.
  3178. else:
  3179. break # No, break out of the while True loop.
  3180. print "Google: Found %d links." % len(links)
  3181. return sorted(links.keys())
  3182.  
  3183. # Example: Search for "monthy pythons"
  3184. links = GoogleHarvester().harvest('monthy pythons')
  3185. open("links.txt","w+b").write("\n".join(links))
  3186.  
  3187. Links found will be written in the file links.txt.
  3188.  
  3189. Please note that the internet evolves all the time, and by the time you are reading this program, Google may have changed. Therefore you may have to adapt this class.
  3190.  
  3191.  
  3192.  
  3193.  
  3194. Building a basic GUI application step-by-step in Python with Tkinter and wxPython
  3195.  
  3196. Here is a full tutorial on how create a GUI. You will learn to build a GUI step-by-step.
  3197. Tkinter and wxPython are compared. Each and every object, method and parameter are explained.
  3198.  
  3199. http://sebsauvage.net/python/gui/index.html
  3200.  
  3201.  
  3202.  
  3203.  
  3204. Flatten nested lists and tuples
  3205.  
  3206. Here's a function which flattens nested lists and tuples.
  3207. (This function is shamelessly heavily inspired from http://www.reportlab.co.uk/cgi-bin/viewcvs.cgi/public/reportlab/trunk/reportlab/lib/utils.py)
  3208.  
  3209. import types
  3210.  
  3211. def flatten(L):
  3212. ''' Flattens nested lists and tuples in L. '''
  3213. def _flatten(L,a):
  3214. for x in L:
  3215. if type(x) in (types.ListType,types.TupleType): _flatten(x,a)
  3216. else: a(x)
  3217. R = []
  3218. _flatten(L,R.append)
  3219. return R
  3220.  
  3221.  
  3222. Example:
  3223.  
  3224. >>> a = [ 5, 'foo', (-52.5, 'bar'), ('foo',['bar','bar']), [1,2,[3,4,(5,6)]],('foo',['bar']) ]
  3225. >>> print flatten(a)
  3226. [5, 'foo', -52.5, 'bar', 'foo', 'bar', 'bar', 1, 2, 3, 4, 5, 6, 'foo', 'bar']
  3227. >>>
  3228.  
  3229.  
  3230.  
  3231.  
  3232.  
  3233. Efficiently iterating over large tables in databases
  3234.  
  3235. When reading rows from a SQL database, you have several choices with the DB-Api:
  3236.  
  3237. * fetchone() : Read one row at time.
  3238. * fetchmany() : Read several rows at time.
  3239. * fetchall() : Read all rows at time.
  3240.  
  3241. Which one do you think is better ?
  3242.  
  3243. At first sight, fetchall() seems to be a good idea.
  3244. Let's see: I have a 140 Mb database in SQLite3 format with a big table. Maybe reading all rows at once is faster ?
  3245.  
  3246. con = sqlite.connect('mydatabase.db3'); cur = con.cursor()
  3247. cur.execute('select discid,body from discussion_body;')
  3248. for row in cur.fetchall():
  3249. pass
  3250.  
  3251. As soon as we run the program, it eats up 140 Mb of memory. Oops !
  3252. Why ? Because fetchall() loads all the rows in memory at once.
  3253. We don't want our programs to be a memory hog. So using fetchall() is barely recommenced.
  3254. There are better ways of doing this. So let's read row by row with fetchone():
  3255.  
  3256. con = sqlite.connect('mydatabase.db3'); cur = con.cursor()
  3257. cur.execute('select discid,body from discussion_body;')
  3258. for row in iter(cur.fetchone, None):
  3259. pass
  3260.  
  3261. fetchone() returns one row at time, and returns None when no more rows are available: In order to use fetchone() in a for loop, we have to create an iterator which will call fetchone() repeatedly for reach row, until the None value is returned.
  3262.  
  3263. It works very well and does not eat memory. But it's sub-optimal: Most databases use 4 Kb data packets or so. Most of the time, it would be more efficient to read several rows at once. That's why we use fetchmany():
  3264.  
  3265. con = sqlite.connect('mydatabase.db3'); cur = con.cursor()
  3266. cur.execute('select discid,body from discussion_body;')
  3267. for row in iter(cur.fetchmany, []):
  3268. pass
  3269.  
  3270. fetchmany() returns a list of row at time (of variable size), and returns an empty list when no more rows as available: In order to use fetchmany() in a for loop, we have to create an iterator which will call fetchmany() repeatedly, until an emptylist [] is returned.
  3271. (Note that we did not specify how many rows we wanted at once: It's better to let the database backend choose the best threshold.)
  3272.  
  3273.  
  3274. fetchmany() is the optimal way of fetching rows: It does not use a lot of memory like fetchall() and it's usually faster than fetchone().
  3275.  
  3276.  
  3277. Note that in our example we used SQLite3, which it not network-based. The difference between fetchone/fetchmany is even greater with network-based databases (mySQL, Oracle, Microsoft SQL Server...), because those databases also have a certain granularity for network packets.
  3278.  
  3279.  
  3280.  
  3281.  
  3282. A range of floats
  3283.  
  3284. Python has a range() function which produces a range of integers.
  3285.  
  3286. >>> print range(2,15,3)
  3287. [2, 5, 8, 11, 14]
  3288.  
  3289. But it does not support floats.
  3290.  
  3291. Here's one which does:
  3292.  
  3293. def floatrange(start,stop,steps):
  3294. ''' Computes a range of floating value.
  3295.  
  3296. Input:
  3297. start (float) : Start value.
  3298. end (float) : End value
  3299. steps (integer): Number of values
  3300.  
  3301. Output:
  3302. A list of floats
  3303.  
  3304. Example:
  3305. >>> print floatrange(0.25, 1.3, 5)
  3306. [0.25, 0.51249999999999996, 0.77500000000000002, 1.0375000000000001, 1.3]
  3307. '''
  3308. return [start+float(i)*(stop-start)/(float(steps)-1) for i in range(steps)]
  3309.  
  3310. Example:
  3311.  
  3312. >>> print floatrange(0.25, 1.3, 5)
  3313. [0.25, 0.51249999999999996, 0.77500000000000002, 1.0375000000000001, 1.3]
  3314.  
  3315.  
  3316.  
  3317.  
  3318. Converting RGB to HSL and back
  3319.  
  3320. HSL (Hue/Saturation/Lightness) is a more human-accessible representation of colors, but most computer work in RGB mode.
  3321.  
  3322. * Hue: The tint (red, blue, pink, green...)
  3323. * Saturation: Does the color falls toward grey or toward the pure color itself ? (It's like the "color" setting of your TV). 0=grey 1=the pure color itself.
  3324. * Lightness: 0=black, 0.5=the pure color itself, 1=white
  3325.  
  3326. Here are two functions which convert between the two colorspaces. Examples are provided in docstrings.
  3327.  
  3328. def HSL_to_RGB(h,s,l):
  3329. ''' Converts HSL colorspace (Hue/Saturation/Value) to RGB colorspace.
  3330. Formula from http://www.easyrgb.com/math.php?MATH=M19#text19
  3331.  
  3332. Input:
  3333. h (float) : Hue (0...1, but can be above or below
  3334. (This is a rotation around the chromatic circle))
  3335. s (float) : Saturation (0...1) (0=toward grey, 1=pure color)
  3336. l (float) : Lightness (0...1) (0=black 0.5=pure color 1=white)
  3337.  
  3338. Ouput:
  3339. (r,g,b) (integers 0...255) : Corresponding RGB values
  3340.  
  3341. Examples:
  3342. >>> print HSL_to_RGB(0.7,0.7,0.6)
  3343. (110, 82, 224)
  3344. >>> r,g,b = HSL_to_RGB(0.7,0.7,0.6)
  3345. >>> print g
  3346. 82
  3347. '''
  3348. def Hue_2_RGB( v1, v2, vH ):
  3349. while vH<0.0: vH += 1.0
  3350. while vH>1.0: vH -= 1.0
  3351. if 6*vH < 1.0 : return v1 + (v2-v1)*6.0*vH
  3352. if 2*vH < 1.0 : return v2
  3353. if 3*vH < 2.0 : return v1 + (v2-v1)*((2.0/3.0)-vH)*6.0
  3354. return v1
  3355.  
  3356. if not (0 <= s <=1): raise ValueError,"s (saturation) parameter must be between 0 and 1."
  3357. if not (0 <= l <=1): raise ValueError,"l (lightness) parameter must be between 0 and 1."
  3358.  
  3359. r,b,g = (l*255,)*3
  3360. if s!=0.0:
  3361. if l<0.5 : var_2 = l * ( 1.0 + s )
  3362. else : var_2 = ( l + s ) - ( s * l )
  3363. var_1 = 2.0 * l - var_2
  3364. r = 255 * Hue_2_RGB( var_1, var_2, h + ( 1.0 / 3.0 ) )
  3365. g = 255 * Hue_2_RGB( var_1, var_2, h )
  3366. b = 255 * Hue_2_RGB( var_1, var_2, h - ( 1.0 / 3.0 ) )
  3367.  
  3368. return (int(round(r)),int(round(g)),int(round(b)))
  3369.  
  3370.  
  3371. def RGB_to_HSL(r,g,b):
  3372. ''' Converts RGB colorspace to HSL (Hue/Saturation/Value) colorspace.
  3373. Formula from http://www.easyrgb.com/math.php?MATH=M18#text18
  3374.  
  3375. Input:
  3376. (r,g,b) (integers 0...255) : RGB values
  3377.  
  3378. Ouput:
  3379. (h,s,l) (floats 0...1): corresponding HSL values
  3380.  
  3381. Example:
  3382. >>> print RGB_to_HSL(110,82,224)
  3383. (0.69953051643192476, 0.69607843137254899, 0.59999999999999998)
  3384. >>> h,s,l = RGB_to_HSL(110,82,224)
  3385. >>> print s
  3386. 0.696078431373
  3387. '''
  3388. if not (0 <= r <=255): raise ValueError,"r (red) parameter must be between 0 and 255."
  3389. if not (0 <= g <=255): raise ValueError,"g (green) parameter must be between 0 and 255."
  3390. if not (0 <= b <=255): raise ValueError,"b (blue) parameter must be between 0 and 255."
  3391.  
  3392. var_R = r/255.0
  3393. var_G = g/255.0
  3394. var_B = b/255.0
  3395.  
  3396. var_Min = min( var_R, var_G, var_B ) # Min. value of RGB
  3397. var_Max = max( var_R, var_G, var_B ) # Max. value of RGB
  3398. del_Max = var_Max - var_Min # Delta RGB value
  3399.  
  3400. l = ( var_Max + var_Min ) / 2.0
  3401. h = 0.0
  3402. s = 0.0
  3403. if del_Max!=0.0:
  3404. if l<0.5: s = del_Max / ( var_Max + var_Min )
  3405. else: s = del_Max / ( 2.0 - var_Max - var_Min )
  3406. del_R = ( ( ( var_Max - var_R ) / 6.0 ) + ( del_Max / 2.0 ) ) / del_Max
  3407. del_G = ( ( ( var_Max - var_G ) / 6.0 ) + ( del_Max / 2.0 ) ) / del_Max
  3408. del_B = ( ( ( var_Max - var_B ) / 6.0 ) + ( del_Max / 2.0 ) ) / del_Max
  3409. if var_R == var_Max : h = del_B - del_G
  3410. elif var_G == var_Max : h = ( 1.0 / 3.0 ) + del_R - del_B
  3411. elif var_B == var_Max : h = ( 2.0 / 3.0 ) + del_G - del_R
  3412. while h < 0.0: h += 1.0
  3413. while h > 1.0: h -= 1.0
  3414.  
  3415. return (h,s,l)
  3416.  
  3417. Note that h (hue) is not constrained to 0...1 because it's an angle around the chromatic circle: You can walk several times around the circle :-)
  3418.  
  3419.  
  3420. Edit: Doh ! Of course, I forgot that Python comes with batteries included: The colorsys module already does that. Repeat after me: RTFM RTFM RTFM.
  3421.  
  3422.  
  3423.  
  3424. Generate a palette of rainbow-like pastel colors
  3425.  
  3426. This function generates a palette of rainbow-like pastel colors.
  3427. Note that it uses the HSL_to_RGB() and the floatrange() functions.
  3428.  
  3429. def generatePastelColors(n):
  3430. """ Return different pastel colours.
  3431.  
  3432. Input:
  3433. n (integer) : The number of colors to return
  3434.  
  3435. Output:
  3436. A list of colors in HTML notation (eg.['#cce0ff', '#ffcccc', '#ccffe0', '#f5ccff', '#f5ffcc'])
  3437.  
  3438. Example:
  3439. >>> print generatePastelColors((5)
  3440. ['#cce0ff', '#f5ccff', '#ffcccc', '#f5ffcc', '#ccffe0']
  3441. """
  3442. if n==0:
  3443. return []
  3444.  
  3445. # To generate colors, we use the HSL colorspace (see http://en.wikipedia.org/wiki/HSL_color_space)
  3446. start_hue = 0.6 # 0=red 1/3=0.333=green 2/3=0.666=blue
  3447. saturation = 1.0
  3448. lightness = 0.9
  3449. # We take points around the chromatic circle (hue):
  3450. # (Note: we generate n+1 colors, then drop the last one ([:-1]) because it equals the first one (hue 0 = hue 1))
  3451. return ['#%02x%02x%02x' % HSL_to_RGB(hue,saturation,lightness) for hue in floatrange(start_hue,start_hue+1,n+1)][:-1]
  3452.  
  3453.  
  3454.  
  3455. Columns to rows (and vice-versa)
  3456.  
  3457. You have a table. You want the columns to become rows, and rows to become columns.
  3458. That's easy:
  3459.  
  3460. table = [ ('Person', 'Disks', 'Books'),
  3461. ('Zoe' , 12, 24 ),
  3462. ('John' , 17, 5 ),
  3463. ('Julien', 3, 11 )
  3464. ]
  3465.  
  3466. print zip(*table)
  3467.  
  3468. You get:
  3469.  
  3470. [ ('Person', 'Zoe', 'John', 'Julien'),
  3471. ('Disks' , 12, 17, 3 ),
  3472. ('Books' , 24, 5, 11 )
  3473. ]
  3474.  
  3475. I told you it was easy :-)
  3476.  
  3477.  
  3478.  
  3479.  
  3480.  
  3481. How do I create an abstract class in Python ?
  3482.  
  3483. mmm... Python does not know this "abstract class" concept. We do not really need it.
  3484.  
  3485. Python uses "duck typing": If it quacks like a duck, then it's a duck.
  3486. I don't care what abstract "duck" class it is derived from as long as it quacks when I call the .quack() method.
  3487. If it has a .quack() method, then that's good enough for me.
  3488.  
  3489. After all, an abstract class is only a contract. Java or C++ compilers enforce syntaxically this contract. Python does not. It lets the grown-up Python programers respect the contract (Well... we're supposed to know what we're doing, aren't we ?).
  3490.  
  3491. One simple example is to redirect standard error to a file:
  3492.  
  3493. import sys
  3494.  
  3495. class myLogger:
  3496. def __init__(self):
  3497. pass
  3498. def write(self,data):
  3499. file = open("mylog.txt","a")
  3500. file.write(data)
  3501. file.close()
  3502.  
  3503. sys.stderr = myLogger() # Use my class to output errors instead of the console.
  3504.  
  3505. print 5/0 # This will trigger an exception
  3506.  
  3507. This will create the file mylog.txt which contains the error instead of displaying the error on the console.
  3508.  
  3509. See ?
  3510. I don't need the class myLogger to derive from an abstract "IOstream" or "Console" class thing: It just needs to have the .write() method. That's all I need.
  3511. And it works !
  3512.  
  3513.  
  3514. But you do can enforce some checks this way:
  3515.  
  3516. class myAbstractClass:
  3517. def __init__(self):
  3518. if self.__class__ is myAbstractClass:
  3519. raise NotImplementedError,"Class %s does not implement __init__(self)" % self.__class__
  3520.  
  3521. def method1(self):
  3522. raise NotImplementedError,"Class %s does not implement method1(self)" % self.__class__
  3523.  
  3524. If you try to call a method which is not implemented in a derived class, you will get an explicit "NotImplementedError" exception.
  3525.  
  3526. class myClass(myAbstractClass):
  3527. def __init__(self):
  3528. pass
  3529.  
  3530. m = myClass()
  3531. m.method1()
  3532.  
  3533. Traceback (most recent call last):
  3534. File "myprogram.py", line 19, in <module>
  3535. m.method1()
  3536. File "myprogram.py", line 10, in method1
  3537. raise NotImplementedError,"Class %s does not implement method1(self)" % self.__class__
  3538. NotImplementedError: Class __main__.myClass does not implement method1(self)
  3539.  
  3540.  
  3541.  
  3542.  
  3543. matplotlib, PIL, transparent PNG/GIF and conversions between ARGB to RGBA
  3544.  
  3545. Yes, that's a lot of things in a single snippet, but if you work with matplotlib or PIL, you will probably need it some day:
  3546.  
  3547. * Generate a matplotlib figure without using pylab
  3548. * Get a transparent bitmap from a matplotlib figure
  3549. * Get a PIL Image object from a matplotlib Figure
  3550. * Convert ARGB to RGBA
  3551. * Save a transparent GIF and PNG
  3552.  
  3553.  
  3554. # Import matplotlib and PIL
  3555. import matplotlib, matplotlib.backends.backend_agg
  3556. import Image
  3557.  
  3558. # Generate a figure with matplotlib
  3559. figure = matplotlib.figure.Figure(frameon=False)
  3560. plot = figure.add_subplot(111)
  3561. plot.plot([1,3,2,5,6])
  3562.  
  3563. # If you want, you can use figure.set_dpi() to change the bitmap resolution
  3564. # or use figure.set_size_inches() to resize it.
  3565. # Example:
  3566. #figure.set_dpi(150)
  3567. # See also the SciPy matplotlib cookbook: http://www.scipy.org/Cookbook/Matplotlib/
  3568. # and especially this example:
  3569. # http://www.scipy.org/Cookbook/Matplotlib/AdjustingImageSize?action=AttachFile&do=get&target=MPL_size_test.py
  3570.  
  3571. # Ask matplotlib to render the figure to a bitmap using the Agg backend
  3572. canvas = matplotlib.backends.backend_agg.FigureCanvasAgg(figure)
  3573. canvas.draw()
  3574.  
  3575. # Get the buffer from the bitmap
  3576. stringImage = canvas.tostring_argb()
  3577.  
  3578. # Convert the buffer from ARGB to RGBA:
  3579. tempBuffer = [None]*len(stringImage) # Create an empty array of the same size as stringImage
  3580. tempBuffer[0::4] = stringImage[1::4]
  3581. tempBuffer[1::4] = stringImage[2::4]
  3582. tempBuffer[2::4] = stringImage[3::4]
  3583. tempBuffer[3::4] = stringImage[0::4]
  3584. stringImage = ''.join(tempBuffer)
  3585.  
  3586. # Convert the RGBA buffer to a PIL Image
  3587. l,b,w,h = canvas.figure.bbox.get_bounds()
  3588. im = Image.fromstring("RGBA", (int(w),int(h)), stringImage)
  3589.  
  3590. # Display the image with PIL
  3591. im.show()
  3592.  
  3593. # Save it as a transparent PNG file
  3594. im.save('mychart.png')
  3595.  
  3596. # Want a transparent GIF ? You can do it too
  3597. im = im.convert('RGB').convert("P", dither=Image.NONE, palette=Image.ADAPTIVE)
  3598. # PIL ADAPTIVE palette uses the first color index (0) for the white (RGB=255,255,255),
  3599. # so we use color index 0 as the transparent color.
  3600. im.info["transparency"] = 0
  3601. im.save('mychart.gif',transparency=im.info["transparency"])
  3602.  
  3603. You can test both images with a non-white background:
  3604.  
  3605. <html><body bgcolor="#31F2F2"><img src="mychart.png"><img src="mychart.gif"></body></html>
  3606.  
  3607. The PNG always look better, especially on darker backgrounds.
  3608.  
  3609. Caveat: All browsers (IE7, Firefox, Opera, K-Meleon, Safari, Camino, Konqueror...) render transparents PNG correctly... Except Internet Explorer 5.5 and 6 ! <grin>
  3610. IE 5.5 and 6 do not support transparent PNG. Period. So you may have to favor the .GIF format. Your mileage may vary.
  3611.  
  3612.  
  3613. Note 1: The ARGB to RGBA conversion could probably be made faster using numpy, but I haven't investigated.
  3614.  
  3615. Note 2: There is a trick to have transparent PNGs in IE 5.5/6. Yes you read it correctly. It works and is a perfectly valid HTML markup.
  3616.  
  3617.  
  3618.  
  3619.  
  3620.  
  3621. Automatically crop an image
  3622.  
  3623. Here's a function which removes the useless white space around an image. It's especially handy with matplotlib to remove the extraneous whitespace around charts.
  3624.  
  3625. This function can handle both transparent and non-transparent images.
  3626.  
  3627. * In case of transparent images, the image transparency is used to determine what to crop.
  3628. * Otherwise, this function will try to find the most popular color on the edges of the image and consider this color "whitespace". (You can override this color with the backgroundColor parameter)
  3629.  
  3630.  
  3631. It requires the PIL library.
  3632.  
  3633. import Image, ImageChops
  3634.  
  3635. def autoCrop(image,backgroundColor=None):
  3636. '''Intelligent automatic image cropping.
  3637. This functions removes the usless "white" space around an image.
  3638.  
  3639. If the image has an alpha (tranparency) channel, it will be used
  3640. to choose what to crop.
  3641.  
  3642. Otherwise, this function will try to find the most popular color
  3643. on the edges of the image and consider this color "whitespace".
  3644. (You can override this color with the backgroundColor parameter)
  3645.  
  3646. Input:
  3647. image (a PIL Image object): The image to crop.
  3648. backgroundColor (3 integers tuple): eg. (0,0,255)
  3649. The color to consider "background to crop".
  3650. If the image is transparent, this parameters will be ignored.
  3651. If the image is not transparent and this parameter is not
  3652. provided, it will be automatically calculated.
  3653.  
  3654. Output:
  3655. a PIL Image object : The cropped image.
  3656. '''
  3657.  
  3658. def mostPopularEdgeColor(image):
  3659. ''' Compute who's the most popular color on the edges of an image.
  3660. (left,right,top,bottom)
  3661.  
  3662. Input:
  3663. image: a PIL Image object
  3664.  
  3665. Ouput:
  3666. The most popular color (A tuple of integers (R,G,B))
  3667. '''
  3668. im = image
  3669. if im.mode != 'RGB':
  3670. im = image.convert("RGB")
  3671.  
  3672. # Get pixels from the edges of the image:
  3673. width,height = im.size
  3674. left = im.crop((0,1,1,height-1))
  3675. right = im.crop((width-1,1,width,height-1))
  3676. top = im.crop((0,0,width,1))
  3677. bottom = im.crop((0,height-1,width,height))
  3678. pixels = left.tostring() + right.tostring() + top.tostring() + bottom.tostring()
  3679.  
  3680. # Compute who's the most popular RGB triplet
  3681. counts = {}
  3682. for i in range(0,len(pixels),3):
  3683. RGB = pixels[i]+pixels[i+1]+pixels[i+2]
  3684. if RGB in counts:
  3685. counts[RGB] += 1
  3686. else:
  3687. counts[RGB] = 1
  3688.  
  3689. # Get the colour which is the most popular:
  3690. mostPopularColor = sorted([(count,rgba) for (rgba,count) in counts.items()],reverse=True)[0][1]
  3691. return ord(mostPopularColor[0]),ord(mostPopularColor[1]),ord(mostPopularColor[2])
  3692.  
  3693. bbox = None
  3694.  
  3695. # If the image has an alpha (tranparency) layer, we use it to crop the image.
  3696. # Otherwise, we look at the pixels around the image (top, left, bottom and right)
  3697. # and use the most used color as the color to crop.
  3698.  
  3699. # --- For transparent images -----------------------------------------------
  3700. if 'A' in image.getbands(): # If the image has a transparency layer, use it.
  3701. # This works for all modes which have transparency layer
  3702. bbox = image.split()[list(image.getbands()).index('A')].getbbox()
  3703. # --- For non-transparent images -------------------------------------------
  3704. elif image.mode=='RGB':
  3705. if not backgroundColor:
  3706. backgroundColor = mostPopularEdgeColor(image)
  3707. # Crop a non-transparent image.
  3708. # .getbbox() always crops the black color.
  3709. # So we need to substract the "background" color from our image.
  3710. bg = Image.new("RGB", image.size, backgroundColor)
  3711. diff = ImageChops.difference(image, bg) # Substract background color from image
  3712. bbox = diff.getbbox() # Try to find the real bounding box of the image.
  3713. else:
  3714. raise NotImplementedError, "Sorry, this function is not implemented yet for images in mode '%s'." % image.mode
  3715.  
  3716. if bbox:
  3717. image = image.crop(bbox)
  3718.  
  3719. return image
  3720.  
  3721. Examples:
  3722.  
  3723. Cropping a transparent image:
  3724. im = Image.open('myTransparentImage.png')
  3725. cropped = autoCrop(im)
  3726. cropped.show()
  3727. Transparent image Cropped image
  3728. Cropping a non-transparent image:
  3729. im = Image.open('myImage.png')
  3730. cropped = autoCrop(im)
  3731. cropped.show()
  3732. Non-transparent image Cropped image
  3733.  
  3734.  
  3735. To do:
  3736.  
  3737. * Crop non-transparent image in other modes (palette, black & white).
  3738.  
  3739.  
  3740.  
  3741.  
  3742.  
  3743. Counting the different words
  3744.  
  3745. A quick way to enumerate the different species in a population (in our case: the different words used and their count):
  3746. This is the kind of thing you could use - for example - to see how many files have the same size, same name or same checksum.
  3747.  
  3748. text = "ga bu zo meuh ga zo bu meuh meuh ga zo zo meuh zo bu zo"
  3749. items = text.split(' ')
  3750.  
  3751. counters = {}
  3752. for item in items:
  3753. if item in counters:
  3754. counters[item] += 1
  3755. else:
  3756. counters[item] = 1
  3757.  
  3758. print "Count of different word:"
  3759. print counters
  3760.  
  3761. print "Most popular word:"
  3762. print sorted([(counter,word) for word,counter in counters.items()],reverse=True)[0][1]
  3763.  
  3764. This displays:
  3765.  
  3766. Count of different word:
  3767. {'bu': 3, 'zo': 6, 'meuh': 4, 'ga': 3}
  3768. Most popular word:
  3769. zo
  3770.  
  3771.  
  3772.  
  3773. You may change the for loop this way:
  3774.  
  3775. for item in items:
  3776. try:
  3777. counters[item] += 1
  3778. except KeyError:
  3779. counters[item] = 1
  3780.  
  3781. This works too, but that's slighly slower than "if item in counters" because generating an exception involves some overhead (creating an KeyError exception object).
  3782.  
  3783.  
  3784. Quick code coverage
  3785.  
  3786. How can you be sure you have tested all parts of your program ? This is an important question, especially if you write unit tests.
  3787. Python has an undocumented module capable of performing code coverage: Trace.
  3788.  
  3789. Instead of running your program with:
  3790.  
  3791. main()
  3792.  
  3793. Do:
  3794.  
  3795. import trace,sys
  3796. tracer = trace.Trace(ignoredirs=[sys.prefix, sys.exec_prefix],trace=0,count=1,outfile=r'./coverage_dir/counts')
  3797. tracer.run('main()')
  3798. r = tracer.results()
  3799. r.write_results(show_missing=True, coverdir=r'./coverage_dir')
  3800.  
  3801. This will create a coverage_dir subdirectory containing .cover files: These files will tell you how many times each line has been executed, and which lines were not executed.
  3802.  
  3803. To convert the .cover files to nice HTML pages, you can use the following program:
  3804.  
  3805. #!/usr/bin/python
  3806. # -*- coding: iso-8859-1 -*-
  3807. import os,glob,cgi
  3808.  
  3809. def cover2html(directory=''):
  3810. ''' Converts .cover files generated by the Python Trace module to .html files.
  3811. You can generate cover files this way:
  3812. import trace,sys
  3813. tracer = trace.Trace(ignoredirs=[sys.prefix, sys.exec_prefix],trace=0,count=1,outfile=r'./coverage_dir/counts')
  3814. tracer.run('main()')
  3815. r = tracer.results()
  3816. r.write_results(show_missing=True, coverdir=r'./coverage_dir')
  3817.  
  3818. Input:
  3819. directory (string): The directory where the *.cover files are located.
  3820.  
  3821. Output:
  3822. None
  3823. The html files are written in the input directory.
  3824.  
  3825. Example:
  3826. cover2html('coverage_dir')
  3827. '''
  3828. # Note: This function is a quick & dirty hack.
  3829.  
  3830. # Write the CSS file:
  3831. file = open("style.css","w+")
  3832. file.write('''
  3833. body {
  3834. font-family:"Trebuchet MS",Verdana,"DejaVuSans","VeraSans",Arial,Helvetica,sans-serif;
  3835. font-size: 10pt;
  3836. background-color: white;
  3837. }
  3838. .noncovered { background-color:#ffcaca; }
  3839. .covered { }
  3840. td,th { padding-left:5px;
  3841. padding-right:5px;
  3842. border: 1px solid #ccc;
  3843. font-family:"DejaVu Sans Mono","Bitstream Vera Sans Mono",monospace;
  3844. font-size: 8pt;
  3845. }
  3846. th { font-weight:bold; background-color:#eee;}
  3847. table { border-collapse: collapse; }
  3848. ''')
  3849. file.close()
  3850.  
  3851.  
  3852. indexHtml = "" # Index html table.
  3853.  
  3854. # Convert each .cover file to html.
  3855. for filename in glob.glob(os.path.join(directory,'*.cover')):
  3856. print "Processing %s" % filename
  3857. filein = open(filename,'r')
  3858. htmlTable = '<table><thead><th>Run count</th><th>Line n°</th><th>Code</th></thead><tbody>'
  3859. linecounter = 0
  3860. noncoveredLineCounter = 0
  3861. for line in filein:
  3862. linecounter += 1
  3863. runcount = ''
  3864. if line[5] == ':': runcount = cgi.escape(line[:5].strip())
  3865. cssClass = 'covered'
  3866. if line.startswith('>>>>>>'):
  3867. noncoveredLineCounter += 1
  3868. cssClass="noncovered"
  3869. runcount = '&#x25ba;'
  3870. htmlTable += '<tr class="%s"><td align="right">%s</td><td align="right">%d</td><td nowrap>%s</td></tr>\n' % (cssClass,runcount,linecounter,cgi.escape(line[7:].rstrip()).replace(' ','&nbsp;'))
  3871. filein.close()
  3872. htmlTable += '</tbody></table>'
  3873. sourceFilename = filename[:-6]+'.py'
  3874. coveragePercent = int(100*float(linecounter-noncoveredLineCounter)/float(linecounter))
  3875. html = '''<html><!-- Generated by cover2html.py - http://sebsauvage.net --><head><link rel="stylesheet" href="style.css" type="text/css"></head><body>
  3876. <b>File:</b> %s<br>
  3877. <b>Coverage:</b> %d%% &nbsp; ( <span class="noncovered">&nbsp;&#x25ba;&nbsp;</span> = Code not executed. )<br>
  3878. <br>
  3879. ''' % (cgi.escape(sourceFilename),coveragePercent) + htmlTable + '</body></html>'
  3880. fileout = open(filename+'.html','w+')
  3881. fileout.write(html)
  3882. fileout.close()
  3883. indexHtml += '<tr><td><a href="%s">%s</a></td><td>%d%%</td></tr>\n' % (filename+'.html',cgi.escape(sourceFilename),coveragePercent)
  3884.  
  3885. # Then write the index:
  3886. print "Writing index.html"
  3887. file = open('index.html','w+')
  3888. file.write('''<html><head><link rel="stylesheet" href="style.css" type="text/css"></head>
  3889. <body><table><thead><th>File</th><th>Coverage</th></thead><tbody>%s</tbody></table></body></html>''' % indexHtml)
  3890. file.close()
  3891.  
  3892. print "Done."
  3893.  
  3894.  
  3895. cover2html()
  3896.  
  3897. Run this program in the directory containing your .cover files, then simply open index.html.
  3898.  
  3899. Here's a test file and its output.
  3900.  
  3901.  
  3902.  
  3903. Note that Python's Trace module is not perfect: For example it will flag "not executed" imports, functions definition and some other lines, although they were executed.
  3904. There are other code coverage modules:
  3905.  
  3906. * Coverage: http://www.nedbatchelder.com/code/modules/coverage.html
  3907. * pyCover: http://www.geocities.com/drew_csillag/pycover.html
  3908. * FigLeaf: http://darcs.idyll.org/~t/projects/figleaf/README.html
  3909. * trace2html: http://cheeseshop.python.org/pypi/trace2html
  3910. * pycoco: http://www.livinglogic.de/Python/pycoco/index.html
  3911.  
  3912.  
  3913.  
  3914.  
  3915. Trapping exceptions to the console under wxPython
  3916.  
  3917. When an exception occurs in your wxPython program, it is displayed in a wxPython window. Sometimes, you just want everything to be logged to the console (stderr), like any other Python program. Here's how to do it:
  3918.  
  3919. import sys
  3920. STDERR = sys.stderr # Keep stderr because wxPyhon will redirect it.
  3921.  
  3922. import wx
  3923.  
  3924. [...your wxPython program goes here...]
  3925.  
  3926. if __name__ == "__main__":
  3927. import traceback,sys
  3928. try:
  3929. app = MyWxApplication() # Start you wxPython application here.
  3930. app.MainLoop()
  3931. except:
  3932. traceback.print_exc(file=STDERR)
  3933.  
  3934. Of course, you can use this trick to log everything to a file if you prefer.
  3935.  
  3936.  
  3937.  
  3938. Get a random "interesting" image from Flickr
  3939.  
  3940. Note: Flickr website has changed, and the following code will not work. It is kept as an example.
  3941.  
  3942. Here's a simple function which returns a random image flagged "interesting" in Flickr.com:
  3943.  
  3944. #!/usr/bin/python
  3945. # -*- coding: iso-8859-1 -*-
  3946. import datetime,random,urllib2,re
  3947.  
  3948. def getInterestingFlickrImage(filename=None):
  3949. ''' Returns a random "interesting" image from Flickr.com.
  3950. The image is saved in current directory.
  3951.  
  3952. In case the image is not valid (eg.photo not available, etc.)
  3953. the image is not saved and None is returned.
  3954.  
  3955. Input:
  3956. filename (string): An optional filename.
  3957. If filename is not provided, a name will be automatically provided.
  3958. None
  3959.  
  3960. Output:
  3961. (string) Name of the file.
  3962. None if the image is not available.
  3963. '''
  3964. # Get a random "interesting" page from Flickr:
  3965. print 'Getting a random "interesting" Flickr page...'
  3966. # Choose a random date between the beginning of flickr and yesterday.
  3967. yesterday = datetime.datetime.now() - datetime.timedelta(days=1)
  3968. flickrStart = datetime.datetime(2004,7,1)
  3969. nbOfDays = (yesterday-flickrStart).days
  3970. randomDay = flickrStart + datetime.timedelta(days=random.randint(0,nbOfDays))
  3971. # Get a random page for this date.
  3972. url = 'http://flickr.com/explore/interesting/%s/page%d/' % (randomDay.strftime('%Y/%m/%d'),random.randint(1,20))
  3973. urlfile = urllib2.urlopen(url)
  3974. html = urlfile.read(500000)
  3975. urlfile.close()
  3976.  
  3977. # Extract images URLs from this page
  3978. re_imageurl = re.compile('src="(http://farm\d+.static.flickr.com/\d+/\d+_\w+_m.jpg)"',re.IGNORECASE|re.DOTALL)
  3979. urls = re_imageurl.findall(html)
  3980. if len(urls)==0:
  3981. raise ValueError,"Oops... could not find images URL in this page. Either Flickr has problem, or the website has changed."
  3982. urls = [url.replace('_m.jpg','_o.jpg') for url in urls]
  3983.  
  3984. # Choose a random image
  3985. url = random.choice(urls)
  3986.  
  3987. # Download the image:
  3988. print 'Downloading %s' % url
  3989. filein = urllib2.urlopen(url)
  3990. try:
  3991. image = filein.read(5000000)
  3992. except MemoryError: # I sometimes get this exception. Why ?
  3993. return None
  3994.  
  3995. filein.close()
  3996.  
  3997. # Check it.
  3998. if len(image)==0:
  3999. return None # Sometimes flickr returns nothing.
  4000. if len(image)==5000000:
  4001. return None # Image too big. Discard it.
  4002. if image.startswith('GIF89a'):
  4003. return None # "This image is not available" image.
  4004.  
  4005. # Save to disk.
  4006. if not filename:
  4007. filename = url[url.rindex('/')+1:]
  4008. fileout = open(filename,'w+b')
  4009. fileout.write(image)
  4010. fileout.close()
  4011.  
  4012. return filename
  4013.  
  4014. print getInterestingFlickrImage()
  4015.  
  4016. WARNING: These images may be NSFW.
  4017.  
  4018.  
  4019.  
  4020.  
  4021. Why is Python a good beginner language ?
  4022.  
  4023. Python is a good language to learn programming, because you can start to write in scripting mode (variables, assigment...), then learn new concepts (procedural programming, conditional branching, loops, objet orientation...).
  4024.  
  4025. Let me put an example: Start with the simple program:
  4026.  
  4027. print "Hello, world !"
  4028.  
  4029. Then learn about variables and inputs/outputs:
  4030.  
  4031. a = input()
  4032. b = a + 2
  4033. print b
  4034.  
  4035. Then learn about procedural programming (loops, conditional branching...):
  4036.  
  4037. a = input()
  4038. b = a + 2
  4039. if b > 10:
  4040. print "More than 10 !"
  4041.  
  4042. Then learn structured programming (functions, return values, recursivity...):
  4043.  
  4044. def square(value):
  4045. return value*value
  4046.  
  4047. print square(5)
  4048.  
  4049. Then learn object orientation:
  4050.  
  4051. class myClass:
  4052. def __init__(self,value):
  4053. self.value = value
  4054. def bark(self):
  4055. print "Woof woof !"
  4056.  
  4057. myObject = myClass(5)
  4058. print myObject.value
  4059. myObject.bark()
  4060.  
  4061. etc.
  4062.  
  4063. This is a great way to learn programming one concept at time.
  4064. And more importantly, experimenting using the Python console (as Python does not require explicit compilation).
  4065.  
  4066. To illustrate Python fitness in programing courses, I will quote a Slashdot reader:
  4067.  
  4068. Java:
  4069.  
  4070. class myfirstjavaprog
  4071. {
  4072. public static void main ( String args[] )
  4073. {
  4074. System.out.println ( "Hello World!" ) ;
  4075. }
  4076. }
  4077.  
  4078. Student asks:
  4079.  
  4080. What is a class?, What is that funny looking bracket?, What is public?, What is static?, What is void for?, What is main?, What are the parenthesis for?, What is a String?, What is args?, How come there are funny square brackets?, What is system?, What does the dot do?, What is out?, What is println?, Why are there quotes there?, What does the semicolon do?, How come it's all indented like that?.
  4081.  
  4082. C:
  4083.  
  4084. #include <stdio.h>
  4085.  
  4086. main()
  4087. {
  4088. printf ( "Hello, World!\n" ) ;
  4089. }
  4090.  
  4091. Student asks:
  4092.  
  4093. What is #include?, What are the greater than and less than signs doing there?, What is stdio.h?, What is main? What are the parenthesis for?, What is the funny bracket for?, What is printf?, Why is hello world in quotes?, What is the backslash-N doing at the end?, What is the semicolon for?
  4094.  
  4095. Python:
  4096.  
  4097. print "Hello World"
  4098.  
  4099. Student asks:
  4100.  
  4101. What is print?, Why is hello world in quotes?
  4102.  
  4103. Get the picture?
  4104.  
  4105.  
  4106.  
  4107. Why Python is not a good beginner language.
  4108. Yes, there are some drawbacks. Those who start with Python may not be aware of these concepts:
  4109.  
  4110. * Memory allocation problems (malloc/free and try/except/finally blocks). More generally, unexperienced Python programers may not be aware of ressources allocation issues (as the Python garbage collector takes care of most problems (file handles, network connections, etc.)).
  4111.  
  4112. * Pointers and low-level operations. Python only manipulates references and objects, which is higher-level programming. Python programers may have hard times with pointers and arrays in C or C++. (Do you like sizeof() ?)
  4113.  
  4114. * Specific API. Python comes with batteries included: It has the same API on all platforms (Windows, Linux, etc.). Other languages have their own API (Java), or a plateform-specific API (C/C++). Programers coming from Python will probably have to learn plateform specificities (which is mostly hidden in Python, eg. os.path.join())
  4115.  
  4116. * Static typing. Python programers will have to cope with mandatory variable and type declaration, casting and eventually templates in statically-typed languages (C++, Java, C#...) in order to acheive the same things they did naturally in Python.
  4117.  
  4118. * Compilation. Compilation is not an issue in itself, but it adds a burden.
  4119.  
  4120. * Well, after learning Python, other languages will look like pain in the ass to the Python developper. This can lead to demotivation.
  4121.  
  4122.  
  4123.  
  4124.  
  4125. Reading LDIF files
  4126.  
  4127. LDIF files contain information exported from LDAP servers.
  4128.  
  4129. Although they seem easy to read, I strongly advise you not to implement your own reader. You'd better use a proven LDIF class.
  4130. For example, you can use the LDIF class provided in http://python-ldap.sourceforge.net. This module provides a nifty LDAP client, but if you need just to read LDIF files, take only ldif.py.
  4131.  
  4132. Here's a usage example (we display ID, firstname and lastname of the persons declared in the LDIF file):
  4133.  
  4134. #!/usr/bin/python
  4135. # -*- coding: iso-8859-1 -*-
  4136.  
  4137. import ldif # ldif module from http://python-ldap.sourceforge.net
  4138.  
  4139. class testParser(ldif.LDIFParser):
  4140. def __init__(self,input_file,ignored_attr_types=None,max_entries=0,process_url_schemes=None,line_sep='\n' ):
  4141. ldif.LDIFParser.__init__(self,input_file,ignored_attr_types,max_entries,process_url_schemes,line_sep)
  4142.  
  4143. def handle(self,dn,entry):
  4144. if 'person' in entry['objectclass']:
  4145. print "Identifier = ",entry['uid'][0]
  4146. print "FirstName = ",entry.get('givenname',[''])[0]
  4147. print "LastName = ",entry.get('sn',[''])[0]
  4148. print
  4149.  
  4150. f = open('myfile.ldif','r')
  4151. ldif_parser = testParser(f)
  4152. ldif_parser.parse()
  4153.  
  4154.  
  4155.  
  4156. Capture the output of a program
  4157.  
  4158. It's easy to capture the output of a command-line program.
  4159.  
  4160. For example, under Windows, we will get the number of bytes received by the workstation by picking up the "Bytes received" line displayed by this command: net statistics workstation
  4161. #!/usr/bin/python
  4162. import subprocess
  4163. myprocess = subprocess.Popen(['net','statistics','workstation'],stdout=subprocess.PIPE)
  4164. (sout,serr) = myprocess.communicate()
  4165. for line in sout.split('\n'):
  4166. if line.strip().startswith('Bytes received'):
  4167. print "This workstation received %s bytes." % line.strip().split(' ')[-1]
  4168.  
  4169. Note that the subprocess module also allows you to send data to program input.
  4170. Thus you can communicate with the command-line program like if it was a user typing (read program output, then react by sending characters, etc.)
  4171.  
  4172. Sometime, you'll want to get the return code of the program. You have to wait for the end of the program to get its return value:
  4173.  
  4174. #!/usr/bin/python
  4175. import subprocess
  4176. myprocess = subprocess.Popen(['net','statistics','workstation'],stdout=subprocess.PIPE)
  4177. (sout,serr) = myprocess.communicate()
  4178. for line in sout.split('\n'):
  4179. if line.strip().startswith('Bytes received'):
  4180. print "This workstation received %s bytes." % line.strip().split(' ')[-1]
  4181. myprocess.wait() # We wait for process to finish
  4182. print myprocess.returncode # then we get its returncode.
  4183.  
  4184.  
  4185.  
  4186. Writing your own webserver
  4187.  
  4188. A webserver is relatively easy to understand:
  4189.  
  4190. * The client (browser) connects to the webserver and sends it HTTP GET or POST request (including path, cookies, etc.)
  4191. * The server parses the incoming request (path (eg. /some/file), cookies, etc.) and responds with a HTTP code (404 for "not found", 200 for "ok", etc.) and sends the content itself (html page, image...)
  4192.  
  4193. Browser
  4194. (HTTP Client) GET /path/hello.html HTTP/1.1
  4195. Host: www.myserver.com Server
  4196. (HTTP Server)
  4197. --------->
  4198.  
  4199. HTTP/1.1 200 OK
  4200. Content-Type: text/html
  4201.  
  4202. <html><body>Hello, world !</body></html>
  4203. <---------
  4204.  
  4205. You can take the entire control of this process and write your own webserver in Python.
  4206. Here is a simple webserver which say "Hello, world !" on http://localhost:8088/
  4207.  
  4208. #!/usr/bin/python
  4209. import BaseHTTPServer
  4210.  
  4211. class MyHandler(BaseHTTPServer.BaseHTTPRequestHandler):
  4212. def do_GET(self):
  4213. self.send_response(200)
  4214. self.send_header('Content-type','text/html')
  4215. self.end_headers()
  4216. self.wfile.write('<html><body>Hello, world !</body></html>')
  4217. return
  4218.  
  4219. print "Listening on port 8088..."
  4220. server = BaseHTTPServer.HTTPServer(('', 8088), MyHandler)
  4221. server.serve_forever()
  4222.  
  4223. * We create a class which will handle HTTP requests arriving on the port (MyHandler).
  4224. * We only handles GET requests (do_GET).
  4225. * We respond with HTTP code 200, which means "everything is ok." (self.send_response(200)).
  4226. * We tell the browser that we're about to send HTML data (self.send_header('Content-type','text/html')).
  4227. * Then we sends the HTML itself (self.wfile.write(...))
  4228.  
  4229.  
  4230. That's easy.
  4231.  
  4232. From there, you can extend the server:
  4233.  
  4234. * by responding with specific HTTP error codes if something goes wrong (404 for "Not found", 400 for "Invalid request", 401 for "No authorized", 500 for "Internal server error", etc.)
  4235. * by serving different html depending on the requested path (self.path).
  4236. * by serving files from disk or pages (or images !) generated on the fly.
  4237. * by sending html data (text/html), plain text (text/plain), JPEG images (image/jpeg), PNG files (image/png), etc.
  4238. * by handling cookies (from self.headers)
  4239. * by handling POST requests (for forms and file uploads)
  4240. * etc.
  4241.  
  4242. Possibilities are endless.
  4243.  
  4244.  
  4245. But there are some reasons why you should not try to write your own webserver:
  4246.  
  4247. * You webserver can only server one request at time. For high-traffic websites, you will need to either fork, use threads or use asynchronous sockets. There are plenty of webserver which are already highly optimized for speed and will be much faster than what you are writing.
  4248. * Webservers provide a great flexility with configuration files. You don'y have to code everything (virtual paths, virtual hosts, MIME handling, password protection, etc.). That's a great timesaver.
  4249. * SECURITY ! Writing your own webserver can be tricky (path parsing, etc.). There are plenty of existing webserver developped with security in mind and which take care of these issues.
  4250. * There are already plenty of ways to incorporate Python code in an existing webserver (Apache module, CGI, Fast-CGI, etc.).
  4251.  
  4252.  
  4253. While writing your own webserver can be fun, think twice before putting this into production.
  4254. SOAP clients
  4255.  
  4256. I have to use a SOAP webservice.
  4257. (Yeah... I know SOAP is a mess, and I'd better not touch that, but I have no choice.)
  4258.  
  4259. So, it's 4th septembre 2007, let's see the state of SOAP clients in Python:
  4260.  
  4261. * First try: SOAPy. Huu... last updated April 26, 2001 ? Try to run it. Oops... it is based on xmllib which is deprecated in Python. No luck !
  4262. Next one:
  4263.  
  4264. * SOAP.py: Last updated in 2005 ? I fetch SOAPpy-0.12.0.zip, unzip, run "python setup.py install":
  4265. SyntaxError: from __future__ imports must occur at the beginning of the file. WTF ?
  4266. By the way, SOAP.py depends on pyXML... which is not maintained since late 2004 and is not available for Python 2.5 !
  4267. What am I supposed to do with this ?
  4268. Ok, let's try another one:
  4269.  
  4270. * ZSI seems to be the current reference. Download the egg, install...
  4271. Visual Studio 2003 was not found on this system. WTF ??!
  4272. Am I supposed to buy an expensive and outdated IDE to use this Python SOAP library ?
  4273. Out of question !
  4274.  
  4275. * Maybe 4Suite ? Seems pretty good.
  4276. Mmmm... no. The developers seem to have ditched SOAP support alltogether.
  4277.  
  4278.  
  4279. So what am I left with ?
  4280.  
  4281. I'm disapointed by the sorry state of SOAP clients in Python.
  4282. Java and .Net have decent implementations, Python has none (At least not without buying VisualStudio on Windows).
  4283.  
  4284. After some googling, I finally found a SOAP client implementation on the excellent effbot page: elementsoap.
  4285. It does not understand WSDL, but that's not a big deal, and it's good enough for me.
  4286.  
  4287. Although the documentation is sparse, it's very easy to use and works well. Example:
  4288.  
  4289. # $Id: testquote.py 2924 2006-11-19 22:24:22Z fredrik $
  4290. # delayed stock quote demo (www.xmethods.com)
  4291.  
  4292. from elementsoap.ElementSOAP import *
  4293.  
  4294. class QuoteService(SoapService):
  4295. url = "http://66.28.98.121:9090/soap" # Put webservice URL here.
  4296. def getQuote(self, symbol):
  4297. action = "urn:xmethods-delayed-quotes#getQuote"
  4298. request = SoapRequest("{urn:xmethods-delayed-quotes}getQuote") # Create the SOAP request
  4299. SoapElement(request, "symbol", "string", symbol) # Add parameters
  4300. response = self.call(action, request) # Call webservice
  4301. return float(response.findtext("Result")) # Parse the answer and return it
  4302.  
  4303. q = QuoteService()
  4304. print "MSFT", q.getQuote("MSFT")
  4305. print "LNUX", q.getQuote("LNUX")
  4306.  
  4307.  
  4308.  
  4309. elementSoap is a good example of good low-tech : A simple library, in pure Python, which only uses the standard Python modules (no dependency on fancy XML processing suite).
  4310. No bell and whistles, but it does the job.
  4311.  
  4312. elementSoap properly handles SOAP exceptions by raising elementsoap.ElementSOAP.SoapFault.
  4313.  
  4314.  
  4315.  
  4316.  
  4317. Archive your whole GMail box
  4318.  
  4319. Gmail is neat, but what happens if you account disappears ? (Shit happens... and Google gives no warranty.)
  4320. Better safe than sorry: This baby can archive your whole GMail box in a single standard mbox file which can be easily stored and imported into any email client.
  4321.  
  4322. It's easy: Run it, enter login and password, wait, and you have a yourusername.mbox file.
  4323. Note: You must have activated IMAP in your GMail account settings.
  4324.  
  4325.  
  4326. #!/usr/bin/python
  4327. # -*- coding: iso-8859-1 -*-
  4328. """ GMail archiver 1.1
  4329.  
  4330. This program will download and archive all you emails from GMail.
  4331. Simply enter your login and password, and all your emails will
  4332. be downloaded from GMail and stored in a standard mbox file.
  4333. This inclues inbox, archived and sent mails, whatever label you applied.
  4334. Spam is not downloaded.
  4335.  
  4336. This mbox files can later on be opened with almost any email client (eg. Evolution).
  4337.  
  4338. Author:
  4339. Sébastien SAUVAGE - sebsauvage at sebsauvage dot net
  4340. Webmaster for http://sebsauvage.net/
  4341.  
  4342. License:
  4343.  
  4344. This program is distributed under the OSI-certified zlib/libpnglicense .
  4345. http://www.opensource.org/licenses/zlib-license.php
  4346.  
  4347. This software is provided 'as-is', without any express or implied warranty.
  4348. In no event will the authors be held liable for any damages arising from
  4349. the use of this software.
  4350.  
  4351. Permission is granted to anyone to use this software for any purpose,
  4352. including commercial applications, and to alter it and redistribute it freely,
  4353. subject to the following restrictions:
  4354.  
  4355. 1. The origin of this software must not be misrepresented; you must not
  4356. claim that you wrote the original software. If you use this software
  4357. in a product, an acknowledgment in the product documentation would be
  4358. appreciated but is not required.
  4359.  
  4360. 2. Altered source versions must be plainly marked as such, and must not
  4361. be misrepresented as being the original software.
  4362.  
  4363. 3. This notice may not be removed or altered from any source distribution.
  4364.  
  4365. Requirements:
  4366.  
  4367. - a GMail account with IMAP enabled in settings.
  4368. - GMail settings in english
  4369. - Python 2.5
  4370. """
  4371. import imaplib,getpass,os
  4372.  
  4373. print "GMail archiver 1.0"
  4374. user = raw_input("Enter your GMail username:")
  4375. pwd = getpass.getpass("Enter your password: ")
  4376. m = imaplib.IMAP4_SSL("imap.gmail.com")
  4377. m.login(user,pwd)
  4378. m.select("[Gmail]/All Mail")
  4379. resp, items = m.search(None, "ALL")
  4380. items = items[0].split()
  4381. print "Found %d emails." % len(items)
  4382. count = len(items)
  4383. for emailid in items:
  4384. print "Downloading email %s (%d remaining)" % (emailid,count)
  4385. resp, data = m.fetch(emailid, "(RFC822)")
  4386. email_body = data[0][1]
  4387. # We duplicate the From: line to the beginning of the email because mbox format requires it.
  4388. from_line = "from:unknown@unknown"
  4389. try:
  4390. from_line = [line for line in email_body[:16384].split('\n') if line.lower().startswith('from:')][0].strip()
  4391. except IndexError:
  4392. print " 'from:' unreadable."
  4393. email_body = "From %s\n%s" % (from_line[5:].strip(),email_body)
  4394. file = open("%s.mbox"%user,"a")
  4395. file.write(email_body)
  4396. file.write("\n")
  4397. file.close()
  4398. count -= 1
  4399. print "All done."
  4400.  
  4401. Note that depending on your language, the folder name will change. For example, if you use the french version, change "[Gmail]/All Mail" to "[Gmail]/Tous les messages".
  4402.  
  4403.  
  4404. Performing HTTP POST requests
  4405.  
  4406. When using urllib or urllib2 to send HTTP requests, it default sends HTTP GET requests. Sometime, you need to POST, either because the remote form does not support GET, or you want to send a file, or you do not want the request parameters to appear in proxy logs or browser history.
  4407.  
  4408. Here's how to do it:
  4409. #!/usr/bin/python
  4410. import urllib,urllib2
  4411.  
  4412. url = 'http://www.commentcamarche.net/search/search.php3'
  4413. parameters = {'Mot' : 'Gimp'}
  4414.  
  4415. data = urllib.urlencode(parameters) # Use urllib to encode the parameters
  4416. request = urllib2.Request(url, data)
  4417. response = urllib2.urlopen(request) # This request is sent in HTTP POST
  4418. page = response.read(200000)
  4419.  
  4420. This is equivalent to the GET request: http://www.commentcamarche.net/search/search.php3?Mot=Gimp
  4421.  
  4422. Note that some forms accept both GET and POST, but not all. For example, you cannot search on Google with HTTP POST requests (Google will reject your request).
  4423.  
  4424. Read a file with line numbers
  4425.  
  4426. Sometime when you read a file, you want to have also the line number you are working on. That's easy to do:
  4427.  
  4428. file = open('file.txt','r')
  4429. for (num,value) in enumerate(file):
  4430. print "line number",num,"is:",value
  4431. file.close()
  4432.  
  4433. Which outputs:
  4434.  
  4435. line number 0 is: Hello, world.
  4436.  
  4437. line number 1 is: I'm a simple text file.
  4438.  
  4439. line number 2 is: Read me !
  4440.  
  4441. This is very handy - for example - when importing a file and signaling which line is erroneous.
  4442.  
  4443.  
  4444. Filter all but authorized characters in a string
  4445.  
  4446. mmm... maybe there's a better way to do this:
  4447.  
  4448. >>> mystring = "hello @{} world.||ç^§ <badscript> &£µ**~~~"
  4449. >>> filtered = ''.join([c for c in mystring if c in 'abcdefghijklmnopqrstuvwxyz0123456789_-. '])
  4450. >>> print filtered
  4451. hello world. badscript
  4452.  
  4453.  
  4454. Writing your own webserver (using web.py)
  4455.  
  4456. Writing your own webserver can be fun, but it's tedious. web.py is a very nice minimalist web framework which simplifies the whole thing.
  4457. Here is a no-brainer example:
  4458.  
  4459. #!/usr/bin/python
  4460. import web
  4461. URLS = ( '/sayHelloTo/(\w+)','myfunction' )
  4462. class myfunction:
  4463. def GET(self,name):
  4464. print "Hello, %s !" % name
  4465. web.run(URLS, globals())
  4466.  
  4467.  
  4468. /sayHelloTo/(\w+) is a regular expression. All URLs arriving on the server matching this pattern will call myfunction. Then myfunction will handle the GET request and return a response.
  4469.  
  4470. Let's test it: http://localhost:8080/sayHelloTo/Homer
  4471.  
  4472. Page generated by our server.
  4473.  
  4474. We got it ! We wrote a page capable of handling requests with parameters in 7 lines of code. Nice.
  4475.  
  4476. You can define as many URL mappings as you want. It's also easy to move the URLs in your server without touching whole subdirectories. And your webserver uses nice human-readable URLs :-)
  4477. web.py also has features to handle html templates, database access and so on.
  4478.  
  4479.  
  4480. XML-RPC: Simple remote method call
  4481.  
  4482. Let's call a method:
  4483.  
  4484. >>> print myobject.sayHello("Homer")
  4485. Hello, Homer !
  4486.  
  4487. We know the method sayHello() is executed on the same computer. How about a calling the sayHello() method of another computer ?
  4488.  
  4489. It's possible: It's client/server technology. There are several ways to do that:
  4490.  
  4491. * Pure sockets (which is a pain in the ass because you have to deal with message encoding/formatting and low-level transmission problem (end-of-message))
  4492. * Webservices/SOAP (which is a pain in the ass because of its horrendous complexity)
  4493.  
  4494. XML-RPC is simple and does the job. Let's see:
  4495.  
  4496. >>> import xmlrpclib
  4497. >>> server = xmlrpclib.ServerProxy("http://localhost:8888")
  4498. >>> print server.sayHello("Homer")
  4499. Hello, Homer !
  4500.  
  4501. You see ? Sheer simplicity. You just declare the server, then call the method as usual.
  4502. The sayHello() method is executed on the server localhost:8888 (which can be another computer). The xmlrpclib library takes care of the low-level details.
  4503.  
  4504. Let's see the corresponding server:
  4505.  
  4506. import SimpleXMLRPCServer
  4507.  
  4508. class MyClass: # (1)
  4509. def sayHello(self, name):
  4510. return u"Hello, %s !" % name
  4511.  
  4512. server_object = MyClass()
  4513. server = SimpleXMLRPCServer.SimpleXMLRPCServer(("localhost", 8888)) # (2)
  4514. server.register_instance(server_object) # (3)
  4515. print "Listening on port 8888"
  4516. server.serve_forever()
  4517.  
  4518. 1. You define a class with all its methods (MyClass)
  4519. 2. You create a XML-RPC server on a given IP/port (SimpleXMLRPCServer)
  4520. 3. You register your objet on this server (register_instance)
  4521.  
  4522. On the low-level side, XML-RPC basically converts you method calls in XML and sends them in a HTTP request.
  4523.  
  4524. There are a few gotchas:
  4525.  
  4526. * Performance: In our example, the server can only serve one request at once. For better performance, you should either use multi-threading, asynchronous sockets, forking...
  4527. * Performance (2): Keep in mind that objets that you pass back and forth between client and server are transmitted on the network. Don't send large datasets. Or think about zlib/base64-encoding them.
  4528. * Security: In our example, anyone can call your webservice. You should implement access control (for example, using HMAC and a shared secret).
  4529. * Security (2): Objects are transmitted in clear text. Anyone can sniff the network and grab your data. You should use HTTPS or use an encryption scheme.
  4530. * Text encoding: Although Python handles UTF-8 nicely, most XML-RPC services can only handle ASCII. You sometimes will have to use UTF-7 (Luckly, Python knows how to "talk" UTF-7).
  4531.  
  4532.  
  4533.  
  4534. Source: IBM DeveloperWorks: XML-RPC for Python.
  4535.  
  4536.  
  4537. Signing data
  4538.  
  4539. In our previous webservice, anyone can call the server. That's bad.
  4540. We can sign the data to make sure:
  4541.  
  4542. * that only authorized programs will be able to call the webservice.
  4543. * that data was not tampered during transport.
  4544.  
  4545. HMAC is a standardized method for signing data. It takes data and a key, and produces a signature.
  4546. Example:
  4547.  
  4548. >>> import hmac
  4549. >>> print hmac.new("mykey","Hello world !").hexdigest()
  4550. d157e0d7f137c9ffc8d65473e038ee86
  4551.  
  4552. d157e0d7f137c9ffc8d65473e038ee86 is the signature of the data "Hello world !" with the key "mykey".
  4553. A different message or a different key will produce a different signature.
  4554.  
  4555. * It's impossible to produce the correct signature for the data without the correct key.
  4556. * The slighest modification in the message will produce a different signature too.
  4557.  
  4558.  
  4559.  
  4560. Let's do it
  4561. Let's try it in our client/server example.
  4562.  
  4563. Our client has a secret shared with the server: It's the key ("mysecret")
  4564. The client signs the data and sends the signature and the data to the server.
  4565.  
  4566. # The client (signs the data)
  4567. import xmlrpclib,hmac,hashlib
  4568. key = "mysecret"
  4569.  
  4570. server = xmlrpclib.ServerProxy("http://localhost:8888")
  4571. name = "Homer"
  4572. signature = hmac.new(key,name).hexdigest()
  4573. print server.sayHello(signature,name)
  4574.  
  4575. Our server receives the signature and the data (name), and checks if the signature is correct.
  4576.  
  4577. # The server (verifies the signature)
  4578. import SimpleXMLRPCServer,hmac,hashlib
  4579. key = "mysecret"
  4580.  
  4581. class MyClass:
  4582. def sayHello(self, signature, name):
  4583. if hmac.new(key,name).hexdigest() != signature:
  4584. return "Wrong signature ! You're a hacker !"
  4585. else:
  4586. return u"Hello, %s !" % name
  4587.  
  4588. server_object = MyClass()
  4589. server = SimpleXMLRPCServer.SimpleXMLRPCServer(("localhost", 8888)) # (2)
  4590. server.register_instance(server_object) # (3)
  4591. print "Listening on port 8888"
  4592. server.serve_forever()
  4593.  
  4594. Let's use our client:
  4595.  
  4596. c:>python client.py
  4597. Hello, Homer !
  4598.  
  4599. The server has accepted our signature.
  4600.  
  4601. On the server side, a wrong signature means that the message was tampered or that the key used was invalid. Let's try both:
  4602.  
  4603.  
  4604. Hacker with a wrong key
  4605. Now, I'm a hacker, but I don't have the key. I try yo sign with a wrong key:
  4606.  
  4607. # The client
  4608. import xmlrpclib,hmac,hashlib
  4609. key = "idontknowthekey" # I don't know the correct key. I try anyway !
  4610. server = xmlrpclib.ServerProxy("http://localhost:8888")
  4611. name = "Homer"
  4612. signature = hmac.new(key,name).hexdigest()
  4613. print server.sayHello(signature,name)
  4614.  
  4615. We call the server:
  4616.  
  4617. c:>python client.py
  4618. Wrong signature ! You're a hacker !
  4619.  
  4620. The server rejected us because we used the wrong key, which cannot generate a correct signature.
  4621.  
  4622.  
  4623. Hacker tampered the message
  4624. I'm a hacker and I don't like Homer. I prefer Superman.
  4625.  
  4626. I don't know the key. I only have the original message and the signature sent by the client.
  4627. I try to alter the message and send it to the server with the same signature:
  4628.  
  4629. # The client
  4630. import xmlrpclib,hmac,hashlib
  4631. server = xmlrpclib.ServerProxy("http://localhost:8888")
  4632. signature = "f927a5f8638f9dc3eaf0804f857e6b34" # I sniffed the signature of "Homer" from the network.
  4633. name = "Superman" # I changed "Homer" to "Superman".
  4634. print server.sayHello(signature,name)
  4635.  
  4636. We call the server:
  4637.  
  4638. c:>python client.py
  4639. Wrong signature ! You're a hacker !
  4640.  
  4641. The server detected that the message was modified and rejected us.
  4642.  
  4643.  
  4644.  
  4645. Conclusion, facts and hints
  4646.  
  4647. * Trying to call the server when you don't have the correct key is pointless. You will never be able to generate the correct signature.
  4648.  
  4649. * It's impossible to deduce the key from a message and its signature. This would required hundreds of thousands of years of computer time.
  4650.  
  4651. * Note that when the server rejects a client, it cannot distinguishing between a wrong key and tampered data.
  4652.  
  4653. * HMAC can use different algorithms (MD5, SHA1, SHA256...). By default, Python HMAC uses MD5. You can use other ones, example: hmac.new(key,name,hashlib.sha256)
  4654. Just don't forget to import hashlib.
  4655.  
  4656. * You can sign several fields at once by concatenating them. Example:
  4657. >>> import hmac,hashlib
  4658. >>> key = "mysecret"
  4659. >>> data = ["Homer","Simpson", 42]
  4660. >>> data_to_sign = "###".join(str(i) for i in data)
  4661. >>> signature = hmac.new(key,data_to_sign).hexdigest()
  4662. >>> print data_to_sign, signature
  4663. Homer###Simpson###42 23a5346b8993d01c99fa263fc836743b
  4664. You will just have to perform the same concatenation on the server side.
  4665.  
  4666. * HMAC will not protect you against:
  4667.  
  4668. o Replay: A hacker can pickup the message sent by the client, and replay it as-is on the server: The server will accept the message. Protection: You can protect against this by inserting a counter in the message, or a date/time, etc.
  4669.  
  4670. o Eavesdropping: A hacker can see the request and the response. Protection: You can encrypt the message (SSL, AES, etc.)
  4671.  
  4672. o DOS (Denial of service): A hacker can send a large amount of requests to the server, which may become unresponsive for legitimate clients. Protection: Use a firewall to filter IPs, or limit the number of requests per second (netfilter/iptable can do that).
  4673.  
  4674.  
  4675. HMAC is a great and simple way to ensure data authenticity and integrity. It's fast to compute and super-resistant.
  4676.  
  4677.  
  4678. Week of the year
  4679.  
  4680. Get the week of the year (week starts on Monday):
  4681.  
  4682. >>> import datetime
  4683. >>> print datetime.datetime(2006,9,4).isocalendar()[1]
  4684. 36
  4685.  
  4686.  
  4687. Stripping HTML tags
  4688.  
  4689. When grabbing HTML from the web, you sometimes just want the text, not the HTML tags. Here's a function to remove HTML tags:
  4690.  
  4691. def stripTags(s):
  4692. ''' Strips HTML tags.
  4693. Taken from http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440481
  4694. '''
  4695. intag = [False]
  4696.  
  4697. def chk(c):
  4698. if intag[0]:
  4699. intag[0] = (c != '>')
  4700. return False
  4701. elif c == '<':
  4702. intag[0] = True
  4703. return False
  4704. return True
  4705.  
  4706. return ''.join(c for c in s if chk(c))
  4707.  
  4708. Example:
  4709.  
  4710. >>> print stripTags('<div style="border:1px solid black;"><p>Hello, <span style="font-weight:bold;">world</span> !</p></div>')
  4711. Hello, world !
  4712.  
  4713. You may then want to decode HTML entites:
  4714.  
  4715.  
  4716. Decode HTML entities to Unicode characters
  4717.  
  4718. When grabbing HTML from the web, when you have stripped HTML tags, it's always a pain to convert HTML entities such as &eacute; or &#233; to simple characters.
  4719. Here's a function which does exactly that, and outputs a Unicode string:
  4720.  
  4721. import re,htmlentitydefs
  4722. def htmlentitydecode(s):
  4723. # First convert alpha entities (such as &eacute;)
  4724. # (Inspired from http://mail.python.org/pipermail/python-list/2007-June/443813.html)
  4725. def entity2char(m):
  4726. entity = m.group(1)
  4727. if entity in htmlentitydefs.name2codepoint:
  4728. return unichr(htmlentitydefs.name2codepoint[entity])
  4729. return u" " # Unknown entity: We replace with a space.
  4730. t = re.sub(u'&(%s);' % u'|'.join(htmlentitydefs.name2codepoint), entity2char, s)
  4731.  
  4732. # Then convert numerical entities (such as &#233;)
  4733. t = re.sub(u'&#(\d+);', lambda x: unichr(int(x.group(1))), t)
  4734.  
  4735. # Then convert hexa entities (such as &#x00E9;)
  4736. return re.sub(u'&#x(\w+);', lambda x: unichr(int(x.group(1),16)), t)
  4737.  
  4738. Let's try it:
  4739.  
  4740. >>> print htmlentitydecode(u"Hello&nbsp;world ! &eacute; &#233; &#x00E9;")
  4741. Hello world ! é é é
  4742.  
  4743. So if you just want to extract the text from a webpage, you can do:
  4744.  
  4745. >>> import urllib2
  4746. >>> html = urllib2.urlopen("http://sebsauvage.net/index.html").read(200000)
  4747. >>> text = htmlentitydecode(stripTags(html))
  4748.  
  4749. Ready for indexing !
  4750. Maybe you'll want to strip accented characters before ? Ok:
  4751.  
  4752.  
  4753. Stripping accented characters
  4754.  
  4755. Stripping accents ? That's easy... when you know how (as seen on the french Python wiki):
  4756.  
  4757. >>> import unicodedata
  4758. >>> mystring = u"éèêàùçÇ"
  4759. >>> print unicodedata.normalize('NFKD',mystring).encode('ascii','ignore')
  4760. eeeaucC
  4761.  
  4762. That's handy - for example - when indexing or comparing strings.
  4763.  
  4764.  
  4765. A dictionnary-like object for LARGE datasets
  4766.  
  4767. Python dictionnaries are very efficient objects for fast data access. But when data is too large to fit in memory, you're in trouble.
  4768. Here's a dictionnary-like object which uses a SQLite database and behaves like a dictionnary object:
  4769.  
  4770. * You can work on datasets which to not fit in memory. Size is not limited by memory, but by disk. Can hold up to several tera-bytes of data (thanks to SQLite).
  4771. * Behaves like a dictionnary (can be used in place of a dictionnary object in most cases.)
  4772. * Data persists between program runs.
  4773. * ACID (data integrity): Storage file integrity is assured. No half-written data. It's really hard to mess up data.
  4774. * Efficient: You do not have to re-write a whole 500 Gb file when changing only one item. Only the relevant parts of the file are changed.
  4775. * You can mix several key types (you can do d["foo"]=bar and d[7]=5468) (You can't to this with a standard dictionnary.)
  4776. * You can share this dictionnary with other languages and systems (SQLite databases are portable, and the SQlite library is available on a wide range of systems/languages, from mainframes to PDA/iPhone, from Python to Java/C++/C#/perl...)
  4777.  
  4778.  
  4779. #!/usr/bin/python
  4780. # -*- coding: iso-8859-1 -*-
  4781. import os,os.path,UserDict
  4782. from sqlite3 import dbapi2 as sqlite
  4783.  
  4784. class dbdict(UserDict.DictMixin):
  4785. ''' dbdict, a dictionnary-like object for large datasets (several Tera-bytes) '''
  4786.  
  4787. def __init__(self,dictName):
  4788. self.db_filename = "dbdict_%s.sqlite" % dictName
  4789. if not os.path.isfile(self.db_filename):
  4790. self.con = sqlite.connect(self.db_filename)
  4791. self.con.execute("create table data (key PRIMARY KEY,value)")
  4792. else:
  4793. self.con = sqlite.connect(self.db_filename)
  4794.  
  4795. def __getitem__(self, key):
  4796. row = self.con.execute("select value from data where key=?",(key,)).fetchone()
  4797. if not row: raise KeyError
  4798. return row[0]
  4799.  
  4800. def __setitem__(self, key, item):
  4801. if self.con.execute("select key from data where key=?",(key,)).fetchone():
  4802. self.con.execute("update data set value=? where key=?",(item,key))
  4803. else:
  4804. self.con.execute("insert into data (key,value) values (?,?)",(key, item))
  4805. self.con.commit()
  4806.  
  4807. def __delitem__(self, key):
  4808. if self.con.execute("select key from data where key=?",(key,)).fetchone():
  4809. self.con.execute("delete from data where key=?",(key,))
  4810. self.con.commit()
  4811. else:
  4812. raise KeyError
  4813.  
  4814. def keys(self):
  4815. return [row[0] for row in self.con.execute("select key from data").fetchall()]
  4816.  
  4817. Use it like a standard dictionnary, except that you give it a name (eg."mydummydict"):
  4818.  
  4819. d = dbdict("mydummydict")
  4820. d["foo"] = "bar"
  4821. # At this point, foo and bar are *written* to disk.
  4822. d["John"] = "doh!"
  4823. d["pi"] = 3.999
  4824. d["pi"] = 3.14159
  4825.  
  4826. You can access your dictionnary later on:
  4827.  
  4828. d = dbdict("mydummydict")
  4829. del d["foo"]
  4830.  
  4831. if "John" in d:
  4832. print "John is in there !"
  4833. print d.items()
  4834.  
  4835. You can open dbdict_mydummydict.sqlite with any other SQLite-compatible tool.
  4836.  
  4837. Our database opened in SQLiteSpy
  4838.  
  4839. Some possible improvements:
  4840.  
  4841. * You can't directly store Python objects. Only numbers, strings and binary data. Objects need to be serialized first in order to be stored.
  4842. * Database path is current directory. It could be passed as a parameter.
  4843. * keys() could be improve to use less memory through the use of an iterator or yield.
  4844. * We do not currently handle database connection closing. The file stays open until the object is destroyed.
  4845.  
  4846.  
  4847. Renaming .ogg files according to tags
  4848.  
  4849. If you have properly-tagged .OGG files (artist, album...) but with wrong filenames (eg. Track01.cda.ogg), the following program will rename files according to OGG tags.
  4850. The ogg files will be renamed to: artist - album - track number - track title.ogg
  4851.  
  4852. It uses the ogginfo command-line tool (which is part of the vorbis-tools which can be downloaded from here). In fact, we simply parse the output of ogginfo.
  4853.  
  4854. A typical ogginfo output is like this:
  4855.  
  4856. Processing file "Track01.cda.ogg"...
  4857.  
  4858. New logical stream (#1, serial: 00002234): type vorbis
  4859. Vorbis headers parsed for stream 1, information follows...
  4860. Version: 0
  4861. Vendor: Xiph.Org libVorbis I 20070622 (1.2.0)
  4862. Channels: 2
  4863. Rate: 44100
  4864.  
  4865. Nominal bitrate: 192,000000 kb/s
  4866. Upper bitrate not set
  4867. Lower bitrate not set
  4868. User comments section follows...
  4869. album=Dive Deep
  4870. artist=Morcheeba
  4871. date=2008
  4872. genre=Pop
  4873. title=Enjoy The Ride
  4874. tracknumber=1
  4875. Vorbis stream 1:
  4876. Total data length: 5238053 bytes
  4877. Playback length: 4m:02.613s
  4878. Average bitrate: 172,721027 kb/s
  4879. Logical stream 1 ended
  4880.  
  4881. We parse this output to get artist, album, title and track number (We simply search for strings like "album=", "artist=", etc.)
  4882.  
  4883. #!/usr/bin/python
  4884. # -*- coding: iso-8859-1 -*-
  4885.  
  4886. # rename_ogg.py
  4887. # Renames .ogg files accotding to OGG tags: artist - album - track number - title
  4888. # This program is public domain.
  4889.  
  4890. import glob,subprocess,os
  4891.  
  4892. def oggrename(filename):
  4893. print filename
  4894. myprocess = subprocess.Popen(['ogginfo',filename],stdout=subprocess.PIPE)
  4895. (sout,serr) = myprocess.communicate()
  4896. trackinfo = {}
  4897. for line in sout.split('\n'):
  4898. for item in ("title","artist","album","tracknumber"):
  4899. if line.strip().lower().startswith(item+"="):
  4900. trackinfo[item] = line.strip()[len(item+"="):].replace(":"," ")
  4901. if item=="tracknumber":
  4902. trackinfo[item] = int(trackinfo[item])
  4903. newfilename = "%(artist)s - %(album)s - %(tracknumber)02d - %(title)s.ogg" % trackinfo
  4904. print "-->",newfilename
  4905. os.rename(filename,newfilename)
  4906. print
  4907.  
  4908.  
  4909. for filename in glob.glob("Track*.cda.ogg"):
  4910. oggrename(filename)
  4911.  
  4912. For example:
  4913.  
  4914. Morcheeba - Dive Deep - 01 - Enjoy The Ride.ogg
  4915. Morcheeba - Dive Deep - 02 - Riverbed.ogg
  4916. Morcheeba - Dive Deep - 03 - Thumbnails.ogg
  4917. Morcheeba - Dive Deep - 04 - Run Honey Run.ogg
  4918. Morcheeba - Dive Deep - 05 - Gained The World.ogg
  4919. Morcheeba - Dive Deep - 06 - One Love Karma.ogg
  4920. Morcheeba - Dive Deep - 07 - Au-delà.ogg
  4921. Morcheeba - Dive Deep - 08 - Blue Chair.ogg
  4922. Morcheeba - Dive Deep - 09 - Sleep On It.ogg
  4923. Morcheeba - Dive Deep - 10 - The Ledge Beyond The Edge.ogg
  4924. Morcheeba - Dive Deep - 11 - Washed Away.ogg
  4925.  
  4926.  
  4927. Reading configuration (.ini) files
  4928.  
  4929. Reading .ini files such as the following one is easy, because Python has an module dedicated to that.
  4930.  
  4931. [sectionA]
  4932. var1=toto
  4933. var2=titi
  4934. homer=simpson
  4935.  
  4936. [sectionB]
  4937. var3=kiki
  4938. var4=roro
  4939. john=doe
  4940.  
  4941. Let's write a program which reads all parameters from all sections:
  4942.  
  4943. #!/usr/bin/python
  4944. # -*- coding: iso-8859-1 -*-
  4945. import ConfigParser
  4946.  
  4947. # Open a configuration file
  4948. config = ConfigParser.SafeConfigParser()
  4949. config.read("config.ini")
  4950.  
  4951. # Read the whole configuration file
  4952. for section in config.sections():
  4953. print "In section %s" % section
  4954. for (key, value) in config.items(section):
  4955. print " Key %s has value %s" % (key, value)
  4956.  
  4957. The output:
  4958.  
  4959. In section sectionB
  4960. Key john has value doe
  4961. Key var3 has value kiki
  4962. Key var4 has value roro
  4963. In section sectionA
  4964. Key homer has value simpson
  4965. Key var1 has value toto
  4966. Key var2 has value titi
  4967.  
  4968. Note that parameters and sections are in no particular order. Never expect to have the parameters in order.
  4969. You can also read a single parameter:
  4970.  
  4971. >>> print config.get("sectionB","john")
  4972. doe
  4973.  
  4974. There are a few gotchas regarding case:
  4975.  
  4976. * Parameters are case-insensitive
  4977. * Sections are case-sensitive.
  4978.  
  4979. >>> print config.get("sectionB","JOHN")
  4980. doe
  4981. >>> print config.get("SECTIONB","john")
  4982. Traceback (most recent call last):
  4983. File "<stdin>", line 1, in <module>
  4984. File "c:\python25\lib\ConfigParser.py", line 511, in get
  4985. raise NoSectionError(section)
  4986. ConfigParser.NoSectionError: No section: 'SECTIONB'
  4987. >>>
  4988.  
  4989. When reading those file, you should be ready to handle missing parameters, which can be done using has_option() or by catching the exception ConfigParser.NoOptionError:
  4990.  
  4991. >>> print config.get("sectionB","Duffy")
  4992. Traceback (most recent call last):
  4993. File "<stdin>", line 1, in <module>
  4994. File "c:\python25\lib\ConfigParser.py", line 520, in get
  4995. raise NoOptionError(option, section)
  4996. ConfigParser.NoOptionError: No option 'duffy' in section: 'sectionB'
  4997.  
  4998.  
  4999. >>> if config.has_option("sectionB","Duffy"):
  5000. ... print config.get("sectionB","Duffy")
  5001. ... else:
  5002. ... print "Oops... option not found !"
  5003. ...
  5004. Oops... option not found !
  5005.  
  5006.  
  5007. >>> try:
  5008. ... print config.get("sectionB","Duffy")
  5009. ... except ConfigParser.NoOptionError:
  5010. ... print "Oops... option not found !"
  5011. ...
  5012. Oops... option not found !
  5013.  
  5014.  
  5015.  
  5016. miniMusic - a minimalist music server
  5017.  
  5018. Serving your MP3/OGG collection over the LAN ? Here's a simple server which does the trick.
  5019.  
  5020. Instructions:
  5021.  
  5022. * Copy this python program in your music folder and run.
  5023. * Point you browser are http://mycomputer:8099
  5024. * If your browser is configured properly, the m3u file will immediately start to playing in your favorite player.
  5025.  
  5026. That's all is takes !
  5027.  
  5028. #!/usr/bin/python
  5029. # -*- coding: iso-8859-1 -*-
  5030.  
  5031. # miniMusic - a minimalist music server
  5032. # Run me in the directory of your MP3/OGG files
  5033. # and point your browser at me.
  5034. # Great for a simple LAN music server.
  5035.  
  5036. import os,os.path,BaseHTTPServer,SimpleHTTPServer,SocketServer,socket,mimetypes,urllib
  5037.  
  5038. PORT = 8099
  5039. HOSTNAME = socket.gethostbyaddr(socket.gethostname())[0]
  5040.  
  5041. MIME_TYPES = mimetypes.types_map
  5042. MIME_TYPES[".ogg"] = u"audio/ogg"
  5043.  
  5044. def buildm3u(directory):
  5045. # Get all .mp3/.ogg files from subdirectories, and built a playlist (.m3u)
  5046. files = [u"#EXTM3U"]
  5047. for dirpath, dirnames, filenames in os.walk(directory):
  5048. for filename in filenames:
  5049. if os.path.splitext(filename)[1].lower() in (u'.mp3',u'.ogg'):
  5050. filepath = os.path.normpath(os.path.join(dirpath,filename))
  5051. files.append(u"#EXTINF:-1,%s" % filename)
  5052. # urllib.quote does not seem to handle all Unicode strings properly
  5053. data = urllib.quote(filepath.replace(os.path.sep,"/").encode("utf-8","replace"))
  5054. files.append(u"http://%s:%s/%s" % (HOSTNAME,PORT,data))
  5055. return files
  5056.  
  5057. class miniMusicServer(SimpleHTTPServer.SimpleHTTPRequestHandler):
  5058. def do_GET(self):
  5059. if self.path == u"/": # We will return the .m3u file.
  5060. self.send_response(200)
  5061. self.send_header(u'Content-Type',u'audio/x-mpegurl; charset=utf-8')
  5062. self.send_header(u'Content-Disposition',u'attachment; filename="playlist.m3u"')
  5063. self.end_headers()
  5064. self.wfile.write(u"\n".join(buildm3u(u".")).encode("utf-8","replace"))
  5065. else: # Return the music file with proper MIME type.
  5066. localpath = urllib.unquote(self.path).decode("utf-8").replace(u"/",os.path.sep)[1:].replace(u"..",u".")
  5067. if os.path.isfile(localpath):
  5068. ext = os.path.splitext(localpath)[1].lower()
  5069. mimetype = u"application/octet-stream"
  5070. if ext in MIME_TYPES: mimetype=MIME_TYPES[ext] # Get the correct MIME type for this extension.
  5071. self.send_response(200)
  5072. self.send_header(u'Content-Type',mimetype)
  5073. self.send_header(u'Content-Length',unicode(os.path.getsize(localpath)))
  5074. self.end_headers()
  5075. self.wfile.write(open(localpath,"rb").read())
  5076. else: # File not found ? Will simply return a 404.
  5077. SimpleHTTPServer.SimpleHTTPRequestHandler.do_GET(self)
  5078.  
  5079. httpd = SocketServer.ThreadingTCPServer(('', PORT), miniMusicServer)
  5080. print u"Music server ready at http://%s:%s" % (HOSTNAME,PORT)
  5081. httpd.serve_forever()
  5082.  
  5083. Let's start it:
  5084.  
  5085. >python miniMusic.py
  5086. Music server ready at http://mycomputer:8099
  5087.  
  5088. Then point your browser at this URL. If you're prompted to either save or open, choose "Open". Your favorite player will play the songs. For example, in VLC:
  5089.  
  5090. Our playlist in VLC
  5091.  
  5092. (Note that some music players have problems with .m3u files (such as Foobar2000), but most will do fine (VLC, WMP...)).
  5093.  
  5094. You can add music in your music directory: It's only a matter of hitting the URL again to get the updated playlist. You do not need to restart the server.
  5095.  
  5096.  
  5097. Explanations
  5098.  
  5099. * The ThreadingTCPServer listens on the given port (8099). Each time a client connects, it spawns a new thread and instanciates a miniMusicServer object which will handle the HTTP request (do_GET()). Therefore each client has its miniMusicServer objet working for him in a separate thread.
  5100.  
  5101. * buildm3u() simply walks the subdirectories, collecting all .mp3/.ogg files and builds a .m3u file.
  5102. m3u files are simple text files containing the URLs of each music file (http://...). Most browsers are configured to open m3u files in media players.
  5103. We add EXTINF informations so that the names show up more nicely in audio players.
  5104. We use some quote/replace/encode so that special characters in filenames are not mangled by browsers or mediaplayers.
  5105.  
  5106. * if self.path == u"/" : The m3u playlist will be served as the default page of our server, otherwise the else will serve the mp3/ogg file itself (with the correct MIME Type: "audio/mpeg" for .mp3 filers, "audio/ogg" for .ogg files.)
  5107. If the file does not exist, we let the base class SimpleHTTPServer display the 404 error page.
  5108.  
  5109. * replace(u"..",u".") is a simple trick to prevent the webserver from serving files outside your music folder.
  5110.  
  5111. * This server is by no mean secure. Do not run it over the internet or over hostile networks. You are warned.
  5112.  
  5113.  
  5114.  
  5115. FTP through a HTTP proxy
  5116.  
  5117. import urllib2
  5118.  
  5119. # Install proxy support for urllib2
  5120. proxy_info = { 'host' : 'proxy.myisp.com',
  5121. 'port' : 3128,
  5122. }
  5123. proxy_support = urllib2.ProxyHandler({"ftp" : "http://%(host)s:%(port)d" % proxy_info})
  5124. opener = urllib2.build_opener(proxy_support)
  5125. urllib2.install_opener(opener)
  5126.  
  5127. # List the content of a directory (it returns an HTML page built by the proxy)
  5128. # (You will have to parse the HTML to extract the list of files and directories.)
  5129. print urllib2.urlopen("ftp://login:password@server/directory").read()
  5130.  
  5131. # Download a file:
  5132. data = urllib2.urlopen("ftp://login:password@server/directory/myfile.zip").read()
  5133. open("myfile.zip","w+b").write(data)
  5134.  
  5135. If someone knows how to upload a file, I'd appreciate the information.
  5136.  
  5137.  
  5138. A simple web dispatcher
  5139.  
  5140. There are plenty of web frameworks out there for Python (such as web.py), but let's write our own again, just for fun.
  5141.  
  5142. What is a web site ? Basically, every url (/foo?param=bar) will run code on the server.
  5143. We need a simple way to map each url to a piece of code. That's what our program below does (Let's see the code first, explanations will follow):
  5144.  
  5145. #!/usr/bin/python
  5146. # -*- coding: iso-8859-1 -*-
  5147. import os,SimpleHTTPServer,SocketServer,socket,cgi,urlparse
  5148.  
  5149. PORT = 8025
  5150. HOSTNAME = socket.gethostbyaddr(socket.gethostname())[0]
  5151.  
  5152. class webDispatcher(SimpleHTTPServer.SimpleHTTPRequestHandler):
  5153.  
  5154. def req_hello(self):
  5155. self.send_response(200)
  5156. self.send_header("Content-Type","text/html")
  5157. self.end_headers()
  5158. self.wfile.write('Hello. Go to <a href="/form">the form<a>.')
  5159.  
  5160. def req_form(self):
  5161. self.send_response(200)
  5162. self.send_header("Content-Type","text/html")
  5163. self.end_headers()
  5164. self.wfile.write('<form action="/say" method="GET">Enter a phrase:<input name="phrase" type="text" size="60"><input type="submit" value="Say it !"></form>')
  5165.  
  5166. def req_say(self,phrase):
  5167. self.send_response(200)
  5168. self.send_header("Content-Type","text/html")
  5169. self.end_headers()
  5170. for item in phrase:
  5171. self.wfile.write("I say %s<br>" % item)
  5172.  
  5173. def do_GET(self):
  5174. params = cgi.parse_qs(urlparse.urlparse(self.path).query)
  5175. action = urlparse.urlparse(self.path).path[1:]
  5176. if action=="": action="hello"
  5177. methodname = "req_"+action
  5178. try:
  5179. getattr(self, methodname)(**params)
  5180. except AttributeError:
  5181. self.send_response(404)
  5182. self.send_header("Content-Type","text/html")
  5183. self.end_headers()
  5184. self.wfile.write("404 - Not found")
  5185. except TypeError: # URL not called with the proper parameters
  5186. self.send_response(400)
  5187. self.send_header("Content-Type","text/html")
  5188. self.end_headers()
  5189. self.wfile.write("400 - Bad request")
  5190.  
  5191. httpd = SocketServer.ThreadingTCPServer(('', PORT), webDispatcher)
  5192. print u"Server listening at http://%s:%s" % (HOSTNAME,PORT)
  5193. httpd.serve_forever()
  5194.  
  5195. Puzzled ? I explain: Every url will call the corresponding method.
  5196.  
  5197. * /hello calls the req_hello() method which displays a welcome page.
  5198. * /form calls the req_form() method which displays a form.
  5199. * /say?phrase=I love you calls the req_say() method which will handle data entered in the form.
  5200.  
  5201. Sounds too easy ? Let's try it:
  5202.  
  5203.  
  5204. The /hello URL simply called the req_hello() method. We have also instructed our server to serve this page as the default page (if action=="": action="hello"), so we can call our server like this:
  5205.  
  5206.  
  5207. Now let's clic to go to the form:
  5208.  
  5209.  
  5210. The /form URL calls the req_form() method which displays the form. Let's enter a phrase and clic "Say it!".
  5211.  
  5212.  
  5213. How nice. The URL /say?phrase=I+love+you called the method req_say(), passing the phrase as parameter.
  5214.  
  5215. Did you notice the for item in phrase ? It's because it's possible to pass a parameter several times.
  5216.  
  5217.  
  5218. And if you ask for a non-existing page, it will serve a HTTP 404 error:
  5219.  
  5220.  
  5221.  
  5222. Now, let's create a new URL which will return the uppercase version of a string: /upper?text=...
  5223. The only thing I have to write is one simple method:
  5224.  
  5225. def req_upper(self,text):
  5226. self.send_response(200) # 200 means "ok"
  5227. self.send_header("Content-Type","text/plain") # We are about to send simple text.
  5228. self.end_headers() # We are done with HTTP headers.
  5229. self.wfile.write(text[0].upper()) # We send the data itself.
  5230.  
  5231. That's all. Now let's try it:
  5232.  
  5233.  
  5234.  
  5235. Magic ?
  5236. No.
  5237.  
  5238. * First, we decode URL parameters with params = cgi.parse_qs(urlparse.urlparse(self.path).query)
  5239. For example ?foo=bar&homer=simpson&foo=kilroy will return { 'foo':['bar','kilroy'], 'homer':['simpson'] }
  5240.  
  5241. * Next, we extract the path with urlparse (eg. "/hello") and build the method name from it ("req_hello").
  5242.  
  5243. * Then we get the method from the name (getattr(self,methodname)), then we call the method with the parameters ((**param)).
  5244.  
  5245.  
  5246. So using the url /say?phrase=I love you is equivalent to self.req_say( phrase=['I love you'] )
  5247.  
  5248. Not so fast...
  5249. There are plenty of other things that are common parts of a webserver I did not speak about:
  5250.  
  5251. * You can serve local files: Simply do:
  5252. self.wfile.write(open("myimage.jpg","rb").read())
  5253.  
  5254. * ...but don't forget to serve the correct MIME type ! ("image/jpeg" for .jpg, "audio/mpeg" for .mp3, etc.), otherwise the browser will not behave correctly.
  5255. import mimetypes
  5256. [...]
  5257. extension = os.path.splitext(filepath)[1].lower() # Get file extension (".jpg", ".mp3"...)
  5258. mimetype = "application/octet-stream" # Default MIME type when extension is unknown
  5259. if extension in MIME_TYPES: mimetype = MIME_TYPES[extension] # Get the MIME type (".jpg"--->"image/jpeg")
  5260. self.send_response(200) # We are ok, let's respond
  5261. self.send_header('Content-Type',mimetype) # Send MIME type in HTTP response headers
  5262. self.end_headers() # We're finished with HTTP headers
  5263. self.wfile.write(open(filepath,"rb").read()) # Then send the file itself.
  5264.  
  5265. * Sending the response length is always better (otherwise the browser will not accurately display progress bar):
  5266. import mimetypes
  5267. [...]
  5268. extension = os.path.splitext(filepath)[1].lower() # Get file extension (".jpg", ".mp3"...)
  5269. mimetype = "application/octet-stream" # Default MIME type when extension is unknown
  5270. if extension in MIME_TYPES: mimetype = MIME_TYPES[extension] # Get the MIME type (".jpg"--->"image/jpeg")
  5271. self.send_response(200) # We are ok, let's respond
  5272. self.send_header('Content-Type',mimetype) # Send MIME type in HTTP response headers
  5273. self.send_header('Content-Length',str(os.path.getsize(filepath))) # Send response size
  5274. self.end_headers() # We're finished with HTTP headers
  5275. self.wfile.write(open(filepath,"rb").read()) # Then send the file itself.
  5276.  
  5277. * Handling session cookies on our server is a bit of work. Around 10 lines of code. No really. To set a cookie in the browser, use:
  5278. self.send_header('Set-Cookie'','mycookie=%s' % sessionid)
  5279. and to read them:
  5280. self.headers["Cookie"]
  5281. A session cookie is only a big random string generated by the server. It easy to generate, for example:
  5282. import random
  5283. sessionid = ''.join([random.choice("abcdefghijklmnopqrstuvwxyz0123456789") for i in range(60)])
  5284.  
  5285. * Storing session informations on the server side (eg."Is the user logged in ?") is just a matter of SQLite (using the sessionid as a key) or even a class attribute.
  5286.  
  5287. * Redirecting is easy:
  5288. self.send_response(302)
  5289. self.send_header("Location","/newurl")
  5290. self.end_headers()
  5291.  
  5292. * You will probably need a HTML templating engine to simplify the HTML page generation.
  5293.  
  5294.  
  5295. A few more hints
  5296. Note that each URL (method) must be called with the exact number of parameters. If you omit one parameter or add one, you will get an error (HTTP 400 - Bad request).
  5297. It's possible to create URLs which accept an arbitrary number of parameters:
  5298.  
  5299. def req_test(self,**kwargs):
  5300. self.send_response(200)
  5301. self.send_header("Content-Type","text/plain")
  5302. self.end_headers()
  5303. self.wfile.write('Ok:\n')
  5304. for (k,v) in kwargs.items():
  5305. for item in v:
  5306. self.wfile.write(" %s=%s\n" % (k,item))
  5307.  
  5308. Using **kwargs, your method will accept any parameters, or even no parameter at all.
  5309. You can call it with:
  5310.  
  5311. * http://mycomputer/test
  5312. * http://mycomputer/test?foo=bar&john=doe&foo=55
  5313. * http://mycomputer/test?foo=bar&john=doe&a=b&c=d&e=f&g=h
  5314.  
  5315. kwargs is a dictionnary. The key is the parameter name, the value is a list of values.
  5316. For example, in the second example, kwargs = { 'foo':['bar','55'], 'john':['doe'] }
  5317.  
  5318.  
  5319. Separating GUI and processing
  5320.  
  5321. If you don't want your GUI to stall when your program is processing data, you'd better use multi-threading. It's always better to clearly separate the processing from the GUI: Create one class to handle all interface/user-interaction things, and one or several others which will do the real stuff.
  5322.  
  5323. One word of advice: Never let two threads touch the GUI simultaneously. Most GUI toolkits are not thread-safe and will happily trash your application.
  5324.  
  5325. Here is a simple threading example: The following program will display a GUI, and a background thread will countdown from 15 to zero. You can click the button anytime to ask the GUI to stop the thread and get the result.
  5326.  
  5327. #!/usr/bin/python
  5328. # -*- coding: iso-8859-1 -*-
  5329. import Tkinter,threading,time
  5330.  
  5331. class MyProcess(threading.Thread):
  5332. def __init__(self,startValue):
  5333. threading.Thread.__init__(self)
  5334. self._stop = False
  5335. self._value = startValue
  5336.  
  5337. def run(self):
  5338. while self._value>0 and not self._stop:
  5339. self._value = self._value - 1
  5340. print u"Thread: I'm working... (value=%d)" % self._value
  5341. time.sleep(1)
  5342. print u"Thread: I have finished."
  5343.  
  5344. def stop(self):
  5345. self._stop = True
  5346.  
  5347. def result(self):
  5348. return self._value
  5349.  
  5350. class MyGUI(Tkinter.Tk):
  5351. def __init__(self,parent):
  5352. Tkinter.Tk.__init__(self,parent)
  5353. self.parent = parent
  5354. self.initialize()
  5355. self.worker = MyProcess(15)
  5356. self.worker.start() # Start the worker thread
  5357.  
  5358. def initialize(self):
  5359. ''' Create the GUI. '''
  5360. self.grid()
  5361. button = Tkinter.Button(self,text=u"Click me to stop",command=self.OnButtonClick)
  5362. button.grid(column=1,row=0)
  5363. self.labelVariable = Tkinter.StringVar()
  5364. label = Tkinter.Label(self,textvariable=self.labelVariable)
  5365. label.grid(column=0,row=0)
  5366. self.labelVariable.set(u"Hello !")
  5367.  
  5368. def OnButtonClick(self):
  5369. '''' Called when button is clickec. '''
  5370. self.labelVariable.set( u"Button clicked" )
  5371. self.worker.stop() # We ask the worker to stop (it may not stop immediately)
  5372. while self.worker.isAlive(): # We wait for the worker to stop.
  5373. time.sleep(0.2)
  5374. # We display the result:
  5375. self.labelVariable.set( u"Result: %d" % self.worker.result() )
  5376.  
  5377. if __name__ == "__main__":
  5378. app = MyGUI(None)
  5379. app.title('my application')
  5380. app.mainloop()
  5381.  
  5382. In our example, a simple integer is exchanged between the GUI and the worker thread, but it can be more complex objects, or even lists.
  5383. You can even have several "worker" objects work at the same time if you want.
  5384.  
  5385. Caveat #1: Beware ! When two threads access the same object, nasty things can happen. You should take care of this concern using locks or Queue objects. Queues are thread-safe and very handy to exchange data and objects between threads. print instruction is also thread-safe. More on this in the next section.
  5386.  
  5387. Caveat #2: Only the main thread will received CTRL+C (or CTRL+Break) events. The main thread should handle it and ask politely the other threads to die, because in Python you can't forcefully "kill" other threads (hence the stop() method). Ah... and under Unix/Linux, threads may continue even if the main thread is dead (use ps/kill to get them).
  5388.  
  5389.  
  5390.  
  5391. Separating GUI and processing, part 2 : Accessing common ressources
  5392.  
  5393. When different threads work on the same ressources, you have a risk of data corruption. The typical example is two threads who want to change the value of the same variable:
  5394.  
  5395. Thread concurrency problem
  5396.  
  5397. On the end, you got a wrong value (6) when you expected 7.
  5398.  
  5399. So each thread should raise a flag saying "Hey, I'm accessing this ressource right now. Nobody touches it until I'm finished." That's what locks are for.
  5400. When a thread wants to perform an action on a ressource, it:
  5401.  
  5402. * asks for the lock (eventually waiting for the lock to be available)
  5403. * perform its operations
  5404. * release the lock.
  5405.  
  5406. Only one thread can take the lock at the same time. This ensure proper operation:
  5407.  
  5408. Threads with locks
  5409.  
  5410.  
  5411. In Python, this is the Lock object. Here is a simple example:
  5412.  
  5413. import threading,time
  5414.  
  5415. def thread1(lock):
  5416. lock.acquire()
  5417. print "T1: I have the lock. Let's work."
  5418. time.sleep(5) # Do my work
  5419. lock.release()
  5420. print "T1: Finished"
  5421.  
  5422. def thread2(lock):
  5423. lock.acquire()
  5424. print "T2: I have the lock. Let's work."
  5425. time.sleep(5) # Do my work
  5426. lock.release()
  5427. print "T2: Finished"
  5428.  
  5429. commonLock = threading.Lock()
  5430.  
  5431. t1 = threading.Thread(target=thread1,args=(commonLock,))
  5432. t1.start()
  5433. t2 = threading.Thread(target=thread2,args=(commonLock,))
  5434. t2.start()
  5435.  
  5436. Which will output:
  5437.  
  5438. T1: I have the lock. Let's work.
  5439. T1: Finished
  5440. T2: I have the lock. Let's work.
  5441. T2: Finished
  5442.  
  5443. You can see that thread2 only works when thread1 does not need the ressource anymore.
  5444. (In fact, we have here 3 threads: The two we started plus the main thread.)
  5445.  
  5446. You may want thread2 to perform some other things until the lock is available. lock.acquire() can be made non-blocking like this:
  5447.  
  5448. import threading,time
  5449.  
  5450. def thread1(lock):
  5451. lock.acquire()
  5452. print "T1: I have the lock. Let's work."
  5453. time.sleep(5) # Do my work
  5454. lock.release()
  5455. print "T1: Finished"
  5456.  
  5457. def thread2(lock):
  5458. while not lock.acquire(0):
  5459. print "T2: I do not have to lock. Let's do something else."
  5460. time.sleep(1)
  5461. print "T2: I have the lock. Let's work."
  5462. time.sleep(5) # Do my work
  5463. lock.release()
  5464. print "T2: Finished"
  5465.  
  5466. commonLock = threading.Lock()
  5467.  
  5468. t1 = threading.Thread(target=thread1,args=(commonLock,))
  5469. t1.start()
  5470. t2 = threading.Thread(target=thread2,args=(commonLock,))
  5471. t2.start()
  5472.  
  5473. Which will give:
  5474.  
  5475. T1: I have the lock. Let's work.
  5476. T2: I do not have to lock. Let's do something else.
  5477. T2: I do not have to lock. Let's do something else.
  5478. T2: I do not have to lock. Let's do something else.
  5479. T2: I do not have to lock. Let's do something else.
  5480. T2: I do not have to lock. Let's do something else.
  5481. T1: Finished
  5482. T2: I have the lock. Let's work.
  5483. T2: Finished
  5484.  
  5485. You see that thread2 can continue to work while waiting for the lock.
  5486.  
  5487. Of course, you can pass several locks to each function or object if you have several ressources to protect. But beware of deadlocks !
  5488. Imagine two threads wanting to work on two ressources: Both threads want to work on objects A and B, but they do not lock in the same order:
  5489.  
  5490. Example of dead lock
  5491.  
  5492. Thread1 will not release the lock on A until it has the lock on B.
  5493. Thread2 will not release the lock on B until it has the lock on A.
  5494. They block each other. You're toasted. Your program will hange indefinitely. So watch out.
  5495.  
  5496. Locking problems in thread can be difficult to debug. (Hint: You can use the logging module and extensively log what threads are doing. This will ease debugging.)
  5497. To prevent deadlocks, you may use non-blocking acquire() and decide it's a failure if you could not get the lock after x seconds. At least you will have the chance to handle the error instead of having your program hang forever.
  5498.  
  5499. Threads are nice, but one rule of thumb:
  5500.  
  5501. The fewer threads the better. The fewer locks the better.
  5502.  
  5503. Reducing the number of threads:
  5504.  
  5505. * Will lower ressource usage (memory, CPU...)
  5506. * Will make your program easier to debug and maintain.
  5507.  
  5508. Reducing the number of locks:
  5509.  
  5510. * Will make your program run faster (no threads waiting for locks).
  5511. * Will reduce the risk of deadlocks.
  5512.  
  5513.  
  5514. Locks are interesting, but Queue objects are better. Not only they are thread-safe (you can put and pickup object into/from the Queue without bothering with locking it), but you can pass objects between threads. Threads can pickup all they want from the Queue, re-insert objet, insert new ones, wait for a specific objects or messages to be present in the queue, etc.
  5515. You can have one big Queue and put object in it (and only interested threads will pick the relevant object from the Queue), or a Queue per thread, to send order to the thread and get its results (input queue/output queue for example). You can also put special "message" objects in the Queue, for example to ask all threads to die or perform special operations.
  5516.  
  5517. More on this later.
  5518.  
  5519. Path of current script
  5520.  
  5521. Want to know what is the path of the current script ?
  5522.  
  5523. import os.path
  5524. print os.path.realpath(__file__)
  5525.  
  5526.  
  5527.  
  5528. Get current public IP address
  5529.  
  5530. The following module will return your current public IP address.
  5531. It uses several external websites to get the address, and will try with another website if one fails (up to 3 times).
  5532.  
  5533. import urllib,random,re
  5534.  
  5535. ip_regex = re.compile("(([0-9]{1,3}\.){3}[0-9]{1,3})")
  5536.  
  5537. def public_ip():
  5538. ''' Returns your public IP address.
  5539. Output: The IP address in string format.
  5540. None if not internet connection available.
  5541. '''
  5542. # List of host which return the public IP address:
  5543. hosts = """http://www.whatismyip.com/
  5544. http://adresseip.com
  5545. http://www.aboutmyip.com/
  5546. http://www.ipchicken.com/
  5547. http://www.showmyip.com/
  5548. http://monip.net/
  5549. http://checkrealip.com/
  5550. http://ipcheck.rehbein.net/
  5551. http://checkmyip.com/
  5552. http://www.raffar.com/checkip/
  5553. http://www.thisip.org/
  5554. http://www.lawrencegoetz.com/programs/ipinfo/
  5555. http://www.mantacore.se/whoami/
  5556. http://www.edpsciences.org/htbin/ipaddress
  5557. http://mwburden.com/cgi-bin/getipaddr
  5558. http://checkipaddress.com/
  5559. http://www.glowhost.com/support/your.ip.php
  5560. http://www.tanziars.com/
  5561. http://www.naumann-net.org/
  5562. http://www.godwiz.com/
  5563. http://checkip.eurodyndns.org/""".strip().split("\n")
  5564. for i in range(3):
  5565. host = random.choice(hosts)
  5566. try:
  5567. results = ip_regex.findall(urllib.urlopen(host).read(200000))
  5568. if results: return results[0][0]
  5569. except:
  5570. pass # Let's try another host
  5571. return None
  5572.  
  5573. Let's try it:
  5574.  
  5575. >>> print public_ip()
  5576. 85.212.182.25
  5577.  
  5578. If you are not connected to the internet, this function will return None.
  5579.  
  5580. Note that this module will only use proxies if the HTTP_PROXY environment variable is defined.
  5581.  
  5582.  
  5583. Bypassing aggressive HTTP proxy-caches
  5584.  
  5585. When you scap the web, you sometimes have to use proxies. The trouble is that some proxies are agressive and will retain an old copy of a web document, whatever no-cache directives you throw at them.
  5586.  
  5587. There is a simple way to force them to actually perform the outgoing request: Add a dummy, ever-changing parameter in each URL. Take for exemple the following URLs:
  5588.  
  5589. http://sebsauvage.net/images/nbt_gros_oeil.gif
  5590. http://www.google.com/search?q=sebsauvage&ie=utf-8
  5591.  
  5592. You can add a dummy parameters with a random value:
  5593.  
  5594. http://sebsauvage.net/images/nbt_gros_oeil.gif?ihatebadlyconfiguredcaches=some_random_thing
  5595. http://www.google.com/search?q=sebsauvage&ie=utf-8&ihatebadlyconfiguredcaches=some_other_random_thing
  5596.  
  5597. Most webservers will simply ignore parameters they don't expect, but the cache will see a different URL, and perform an real outgoing request.
  5598. Parameters can be added to any URL, even URL pointing to static content (like images).
  5599.  
  5600. Here is a function which will generate a big, random, everchanging number to add to your URLs:
  5601.  
  5602. import time,random
  5603.  
  5604. def randomstring():
  5605. return unicode(time.time()).replace(".","")+unicode(random.randint(0,999999999))
  5606.  
  5607. We use current time and a random number. Chances that the two are identitcal are almost nil. Let's generate a few URLs:
  5608.  
  5609. >>> for i in range(10):
  5610. >>> url = u"http://sebsauvage.net/images/nbt_gros_oeil.gif?ihatebadlyconfiguredcaches=%s" % randomstring()
  5611. >>> print url
  5612.  
  5613. http://sebsauvage.net/images/nbt_gros_oeil.gif?ihatebadlyconfiguredcaches=124688335429962801620
  5614. http://sebsauvage.net/images/nbt_gros_oeil.gif?ihatebadlyconfiguredcaches=124688335429525336904
  5615. http://sebsauvage.net/images/nbt_gros_oeil.gif?ihatebadlyconfiguredcaches=124688335429135412731
  5616. http://sebsauvage.net/images/nbt_gros_oeil.gif?ihatebadlyconfiguredcaches=1246883354294594563
  5617. http://sebsauvage.net/images/nbt_gros_oeil.gif?ihatebadlyconfiguredcaches=124688335429345799545
  5618. http://sebsauvage.net/images/nbt_gros_oeil.gif?ihatebadlyconfiguredcaches=12468833542951092870
  5619. http://sebsauvage.net/images/nbt_gros_oeil.gif?ihatebadlyconfiguredcaches=124688335429681210237
  5620. http://sebsauvage.net/images/nbt_gros_oeil.gif?ihatebadlyconfiguredcaches=12468833542928938190
  5621. http://sebsauvage.net/images/nbt_gros_oeil.gif?ihatebadlyconfiguredcaches=124688335429139328702
  5622. http://sebsauvage.net/images/nbt_gros_oeil.gif?ihatebadlyconfiguredcaches=124688335429753511849
  5623.  
  5624. Each time you construct the URL, the ihatebadlyconfiguredcaches parameter value will be different, preventing caches to cache the page.
  5625.  
  5626.  
  5627. Yes, I know this trick is ugly, but I encountered some very badly behaved caches ignoring all no-cache directives (yes, even in forms) and this method got rid of the problem.
  5628.  
  5629.  
  5630. Make sure the script is run as root
  5631.  
  5632. If you want to make sure you program is run as root:
  5633.  
  5634. import os
  5635. if os.geteuid() != 0:
  5636. print "This program must be run as root. Aborting."
  5637. sys.exit(1)
  5638.  
  5639. Note that it only works under *nix environments (Unix, Linux, MacOSX...), but not Windows.
  5640.  
  5641.  
  5642. Automated screenshots via crontab
  5643.  
  5644. If you have a script which runs as daemon or cron, you may want to know if a user has started a graphical session. Here's a way to do it (Runs under Linux only).
  5645.  
  5646. def currentuser():
  5647. ''' Return the user who is currently logged in and uses the X session.
  5648. None if could not be determined.
  5649. '''
  5650. user = None
  5651. for line in runprocess(["who","-s"]).split('\n'):
  5652. if "(:0)" in line:
  5653. user = line.split(" ")[0]
  5654. return user
  5655.  
  5656. This is useful, for example, to take a screenshot of the user's screen with scrot:
  5657.  
  5658. import os,sys,subprocess
  5659. user = currentuser()
  5660. if not user:
  5661. print "No user logged in."
  5662. sys.exit(1)
  5663. # If a user is logged in, we take a screenshot:
  5664. commandline = 'DISPLAY=:0 su %s -c "scrot /tmp/image.png"' % user
  5665. myprocess = subprocess.Popen(commandline,shell=True)
  5666. myprocess.wait()
  5667.  
  5668. This trick is needed because when your script runs in crontab, it does not have a full environment and - obviously - no X. So scrot won't run as-is: We have to run it as the user who has a graphical session, and we also force the DISPLAY environment variable so that scrot knows which display to capture.
  5669.  
  5670. Note that we run scrot using a shell (shell=True): Some programs need a full shell environment to work properly.
  5671.  
  5672.  
  5673.  
  5674. External links
  5675.  
  5676. * Python Idioms and Efficiency: http://jaynes.colorado.edu/PythonIdioms.html
  5677. * Python speed/Performance tips: http://wiki.python.org/moin/PythonSpeed/PerformanceTips
  5678. * Python Grimoire: http://the.taoofmac.com/space/Python/Grimoire
  5679. * Python Cookbook: http://aspn.activestate.com/ASPN/Python/Cookbook/
  5680. * 10 Python pitfalls: http://zephyrfalcon.org/labs/python_pitfalls.html
  5681. * Python beginner's mistakes: http://zephyrfalcon.org/labs/beginners_mistakes.html
  5682. * Python Gotchas: http://www.ferg.org/projects/python_gotchas.html
  5683. * Python Essays: http://www.python.org/doc/essays/
  5684. * Python FAQs: http://www.python.org/doc/faq/
  5685. * DaniWeb Python code snippets: http://www.daniweb.com/code/python.html
  5686. * "Answer My Searches" Python code snippets: http://www.answermysearches.com/index.php/category/python/
  5687. * and of course the famous Python Eggs (lots of links): http://www.python-eggs.org/
Add Comment
Please, Sign In to add comment