Regis' Blog

L'informatique, L'amour, Les vaches

Python, strftime and Unicode
January 08, 2015
|
Share this post
| |
Python, strftime and Unicode

The Unicode management in sometimes a pure nightmare with Python 2.x. Even experimented developers could find understandable behaviour. Let’s take a piece of code:

# French months contain unicode characters 
locale.setlocale(locale.LC_ALL, "fr_fr")  
# August = août 
datetime.datetime(2014, 8, 1).strftime('%d %M %Y')  

This piece of code returns a  str and not an unicode because behind strftime hides a C system call.

The problem

Not really important you’ll say except if we concatenate two strings like:

u"Une chaîne unicode" + datetime.datetime(2014, 8, 1).strftime("%d %M %Y")  

will raise an UnicodeDecodeError because the C call will returns in the system encoding a string not an unicode. This is not a normal and usual standard behaviour for Python. You can write all the .decode('utf-8') or .encode('utf-8') of the planet, it will not work. You’ll have again and again  UnicodeDecodeError waiting for an embush.

The solution

It consists to trust Python and its portability. We’ll ask to Python what is the C library encoding (and sometimes it will surprise you) and then ask a decoding. And for this we need the locale module:

locale.setlocale(locale.LC_ALL, "fr_fr")  
pref_enc = locale.getpreferredencoding()  
dt = datetime.datetime(2014, 8, 1)  
u"Une chaîne unicode " + dt.strftime('%d %B %Y').decode(pref_enc)  

And it works.

The other solution

Use Python 3. All major libraries run with Py3.

No excuses.

January 08, 2015
|
Share this post
| |