Decoding Woes Solved – Python

Hello World,

This is a short post.  Basically I had a data set come in, where there were some funky characters involved.  I was getting “Can’t read this; doesn’t appear to be UTF-8”.  Looked around on stackoverflow for a while to little avail.  I came up with this, which works.

dataPath = "C:\\data\\CompanyA\\DavidCrook\\davidData_Session1.csv"
fil = open(dataPath)
txt = fil.readlines()
txt = ''.join(txt)
works = pd.read_csv(StringIO(txt), index_col = 0)
doesntWork = pd.read_csv(dataPath, index_col = 0)

Just read the sucker with the standard file open and line reader, push it into a StringIO and then read into a data frame.  Guess what I’m doing from now on.

#MicroBlogPost 🙂

One thought on “Decoding Woes Solved – Python

  1. An alternative would be to use the following try-except block:

    print (myUnreadableData)
    except UnicodeEncodeError:
    print (myUnreadableData).encode(‘utf8’)

    This helps when I’m reading tweets which commonly have strange characters.

Leave a Reply

Your email address will not be published. Required fields are marked *