Wednesday, 28 August 2013

Python3 : unescaping non ascii characters

Python3 : unescaping non ascii characters

(Python 3.3.2) I have to unescape some non ASCII escaped characters. I see
here and here methods that doesn't work. I'm working in a 100% UTF-8
environment.
# pure ASCII string : ok
mystring = "a\\n" # expected unescaped string : "a\n"
cod = codecs.getencoder('unicode_escape')
print( cod(mystring) )
# non ASCII string : method #1
mystring = "€\\n"
# equivalent to : mystring = codecs.unicode_escape_decode(mystring)
cod = codecs.getdecoder('unicode_escape')
print(cod(mystring))
# RESULT = ('â\x82¬\n', 5) INSTEAD OF ("€\n", 2)
# non ASCII string : method #2
mystring = "€\\n"
mystring = bytes(mystring, 'utf-8').decode('unicode_escape')
print(mystring)
# RESULT = â\202¬ INSTEAD OF "€\n"
Is this a bug ? Have I misunderstood something ?
Any help would be appreciated !

No comments:

Post a Comment