All Text Is Encoded, And When Migrating WordPress You Have To Remember This

I spend some of my time helping people migrate to WordPress from other blogging platforms or helping them move between hosts.

All text on computers is encoded. There is no such thing as plain text.

Once, I have an export file, the first thing I always do (now) is:

$ file post.txt

The results are:

post.txt: UTF-8 Unicode English text, with very long lines, with CRLF, LF line terminators

So, I have to either make sure that the database that I’m importing into is the same encoding, UTF8 in this case (now the default encoding for WordPress), or I have to convert the encoding in the file using iconv and possibly additional steps if it is a database backup — better just to make sure it’s the same encoding.

This entry was posted in Opinion. Bookmark the permalink. Follow any comments here with the RSS feed for this post.

2 Responses to All Text Is Encoded, And When Migrating WordPress You Have To Remember This

  1. engtech says:

    I wonder if iconv would help me get past some of the Base64 problems I’ve had with perl and XML::RPC?

  2. Trent says:

    Did a content import with some issues with characters that were wrong, so I did some searching for the ’search and replace’ plugin to fix some issues ‘after the fact’ if anyone else needs to help fix things up! Search over at the plugins section of the wordpress.org site!

Leave a Reply

Your email is never published nor shared. Required fields are marked *

*

You may use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

  • 1