The Music Site The Subscription Site Join The Mailing List Buy via iTunes Buy via Amazon Follow on Twitter Matthew on Facebook

A quick note about scraping

Blog

This one’s for the geeks. “[tag]Feed scraping[/tag]” is when some [tag]spam[/tag] site steals your [tag]blog[/tag] content, posts it on their blog, and throws in a bunch of links to penis-enlarging pills and such. There’s a couple of things you can do to help make this less of a problem.

One way to help get people AWAY from those sites is to make sure you use [tag]TARGET attribute[/tag]s in all your links. Meaning:

<A HREF="http://matthewebel.com">

becomes:

<A HREF="http://matthewebel.com" TARGET="_top">

The _top target means the link will load in the browser window you’re looking at, breaking out of frames and getting away from whatever site you were on. _blank will open a link in a new window, handy if you want to link to someone else’s site without sending people away from your own.

Another trick is to use your web host’s [tag]redirects[/tag]. Any web host uses a file called [tag].htaccess[/tag] at the root (homepage) level to handle redirects. All it needs to be is a list that looks something like this:

Redirect /blog http://matthewebel.com/main/
Redirect /preorder http://matthewebel.com/main/store/preorder

Update: Check the comment below from Michael for even better redirect technique!

If you can make as many links as possible redirect through your web host, you’ll be able to track clickthroughs even if they’re on someone else’s site. You can also keep on top of changing URL’s, such as my [tag]iTunes Music Store[/tag]. The actual URL, as of this post, is this:

http://ax.phobos.apple.com.edgesuite.net/WebObjects/MZStore.woa
/wa/browserRedirect?url=itms%253A%252F%252Fphobos.apple.com
%252FWebObjects%252FMZStore.woa%252Fwa%252FviewArtist%253Fid
%253D4260326%2526partnerId%253D30%2526partnerId%253D30%2526siteID
%253DzuwuVvoU8C8-gX0y3aSBtm55l1XnuHAlfA

But I can keep it smaller and always up-to-date by simply redirecting http://matthewebel.com/itunes to the current store page, using my affiliate link. Even if this link stays online for years, it’ll always point to the right place if I keep my .htaccess file up to date.

Just a few thoughts for some good web-housekeeping. If you’ve got any other suggestions, let me know!

Update: Oh, and in case you’re wondering, all my “Read More” links point to _top as well, I had to hack [tag]WordPress[/tag] to do that.


  • Great pointers. Here is another redirect trick that might come in handy for Wordpress...

    Since Wordpress tries to be smarter than me, I use htaccess redirects to handle those "protocols" that Wordpress incorrectly tries to translate in my links. Therefore, I end up with redirects like this:

    Redirect 301 /gtalk xmpp:abiteofsanity@gmail.com

    Which will browser-redirect to whatever application is set to handle Google Talk. Now, I can put a link on my site to http://abiteofsanity.com/gtalk and Wordpress will manage it appropriately. If I try to put the "xmpp" address into the link, Wordpress mangles it trying to turn it back into an "http" link.
  • Just a little note: I recently learned that there are two kinds of forwarding: a temporary one and a permanent one. The effect is the same, but the so called status code sent to the browser - or more important: the search engine spy robot - is different. Status code 302 stands for temporary forwarding, status code 301 for permanent forwarding. Unfortunately, if you don't include a specific redicect type in your .htaccess file, the 302 status code is used. And that is said to be not so good particularly with regard to the indexing and ranking done by Google.

    To put it in a nutshell, if the forwarding is intended as permanent (which it is in most cases - unless you forward a parked domain temporarily to an existing site while you work on the content, that would be one of the rare cases of a temporary forwarding) you should tell browsers and search engines so by adding the corresponding status code in your .htaccess file.

    Redirect /blog http://matthewebel.com/main/
    would simply become
    Redirect 301 /blog http://matthewebel.com/main/
    if you want to establish a permanent forwarding.
  • John-

    Actually, my iTMS address only changed once, and that was shortly after they brought indie artists into the store anyway. I did, however, sign up for an affiliate link lately, so I don't just send people to the iTMS anymore anyhow. Rather than go back and change all my iTunes links from previous posts and blogs, I just update my redirect and they all work.

    This isn't to say that the iTMS won't change URL's on us in the future, though. Ten years from now, who knows? They might have a major system change, but all I have to do is change one link in one file.

    Pax,
    Matthew
  • Excellent tips, Matthew. I was unaware that an artist's iTunes music store URL changes; is there a reason, or just iTunes being unfriendly?
blog comments powered by Disqus