Saturday, May 8, 2010

fsed.sh

After downloading the Toonami Reactor web files from the Internet Archive, I needed to strip out all the Wayback Machine code/markers.  The Wayback Machine added JavaScript code to rewrite HTML links thus maintaining "temporal integrity".  These needed to be removed for Reactor 2.5 to function properly.  Removing them manually would be too time consuming and frustrating.  The application Cygwin provided access to many tools such as sed that allow for bulk replacement of text.  However, sed (or this version on least) can only replace text on a single line.  The solution was to write a custom bash script to replace newline characters ('\n') with the bell character ('\a') and switch back after removing the offending text.  See code:

#!/bin/bash
# fsed.sh
# Name: File sed
# Replaces all newlines with the bell (\a), performs sed,
# then switches back.

regex=$1;
# Shift the argument array, to move the regex value. 
shift
for i in $*
do
  cat $i | tr "\n" "\a" | sed "$regex" | tr "\a" "\n" > $i;
done

Called like this: >fsed.sh "/regex/replace/g" ./*.html
(Of course, this is designed to work on a mass of files.  And it's incredibly dangerous, what with rewriting files and such.  Use caution.)

It's possible there are other escape characters other than the bell that could be used, but it worked well enough at the time.

No comments:

Post a Comment

Comment moderation kicks-in after 30 days for old posts, so don't worry if a message doesn't appear.