Automatically updating squidGuard black and white lists

Why am I using SquidGuard?

Having children in the house, I would like them to be able to browse the internet as safely as possible, and block as many offensive sites as possible. After a bit of looking around, I decided on the combination of squid (http://www.squid-cache.org) and SquidGuard (http://squidguard.org/index.html).

I’ve used squid before and it is really easy to set up and use.  There are a huge number of options, but the default settings work well enough without doing too much.  There are also lots of recipes available that other people have taken the time to write, so help is always at hand.

Again most (if not all) Linux distributions provide a ready packaged Squid distribution, so the amount of effort required to install it rally is minimal.

If you are running a small home network, then you only need to install this on one machine and tell every other machine to use it.  This can be done automatically – more on this in another post.

To provide protection against unsuitable sites, I pair Squid with SquidGuard. The way this works is that Squid passes every link to every page requested to SquidGuard.  SquidGuard examines this link  and compares it to its database (more on this later) of unsuitable sites. If it finds a match it returns a replacement URL which Squid then returns to the user.

Install Squid

The first  is to install Squid and check it works. Follow your distribution’s instructions to set it up.

Change your browser settings to tell it to use squid as its proxy ( you’ll need the machine address and port). Browse some pages to check it all works.

Install and test SquidGuard

The next step required is to install SquidGuard. If you are lucky then your distribution may have this packaged already for you.

If not head on over to http://squidguard.org/ and grab the download.

Again there are plenty of instructions about how to do this on the web (and the guide on the squidGuard itself shows how easy it is).

All that’s required to tell Squid to use squidGuard is the following line.  This simply tells squid to run every URL through this.

url_rewrite_program /usr/local/bin/squidGuard -c /usr/local/etc/squid/squidGuard.conf

If you have installed from a package manager, then this may have been included for you.

How do we test squidGuard? Well, the obvious way is to try and browse an unsuitable site! If it’s blocked – it works! If you see it, it doesn’t.

Maintaining the lists

There are several sites which have lists of sites – both whitelists and blacklists.  If you’ve installed squidGuard, you’ve probably entered a few sites to always allow (for example, some news sites), and some to always block (maybe games you don’t want your children to access).

The two sites I use are:

  • http://squidguard.mesd.k12.or.us/blacklists.tgz
  • http://www.shallalist.de/Downloads/shallalist.tar.gz

When you update the lists you want to keep your changes, so we need to combine the lists.

The quick and dirty script here downloads the blacklists and combines the lines into the existing databases and then alters the squidGuard configuration file to include and new lists. Here is the
update_blacklist file. Feel free to use/modify/criticise!

2,470 total views, 1 views today

This entry was posted in squid, squidGuard. Bookmark the permalink.

7 Responses to Automatically updating squidGuard black and white lists

  1. Tim says:

    Hi Jason,

    No Problem re: all the comments – sorry it’s taking me some time to get back to you.

    The purpose of the function was to create a group of all the black lists in one place, so I’d just need to refer to that set if I wanted the absolute maximum level of blocking. It would also mean that an new lists/urls/categories would be automatically included. From memory, I think the different balcklist providers had different categories also, so if the provider was changed, their lists would get included automatically.

    It does this by creating a temp file with the header (the auto-generated bit) and a dest{} block.

    The squidGard.conf file then has this appended to it. If there is an existing auto-generated function already existing – this is removed first. This block is always at the end to the squidGaurd,conf file.

    I’ve got other manually defined dest{} blocks which deal with types of black lists (ie. children need the most – adults can look at some categories).

    the last lines of my auto generated squidGuard.conf file look like this

    ### AUTO GENERATED CONFIG ###
    ### generated on Thu Sep 29 22:02:56 BST 2016
    ###
    ###
    ###—
    dest auto_bl_complete {
    domainlist /var/db/squidGuard/ads/domains
    :
    :
    :
    :
    }
    EOF

    your file looks as though it’s managed to write some of the SED commands into the file.

    A couple of things spring to mind.
    1. You may not need this ! 😉 if you’ve defined your own block destinations based on the blacklists, that may be enough, and you want to check yourself periodically to see if there are other things you should add.

    2. we dig deeper into the SED script. This is a real head scratcher! I’m assuming you’ve re-copied the lines from the download & pasted them in:


    sed -e ‘
    {
    /###—/ a\
    dest auto_bl_complete {

    /^[^#]/ {
    s/^\.//
    s#\(.*\)/\(.*\)s# \2list \1/\2s#
    }
    $ a\
    }

    To be honest (& lazy) option 1 may be the way to go – especially if you’ve defined all the required dest{} blocks manually.

  2. Jason says:

    Good day,

    Great script, I’ve spent the last couple hours getting it fine tuned for Ubuntu, but I am now stuck as I don’t have a lot of bash scripting experience.

    The following code in alter_conf() gives me this error: sed: -e expression #1, char 167: unexpected `}’

    Code:
    sed -e ‘
    {
    /###—/ a\
    dest auto_bl_complete {

    /^[^#]/ {
    s/^\.//
    s#\(.*\)/\(.*\)s# \2list \1/\2s#
    }
    $ a\
    }

    }
    ‘ ${fulllist}

    Would you be able to provide me with some insight?

    Cheers,

    Jason

    • Jason says:


      sed -e '
      {
      /###---/ a\
      dest auto_bl_complete {

      /^[^#]/ {
      s/^\.//
      s#\(.*\)/\(.*\)s# \2list \1/\2s#
      }
      $ a\
      }

      }
      ' ${fulllist}

    • Tim says:

      Hi Jason,

      My script is running in a freenas jail (freebsd)- but that shouldn’t make any difference.

      I’ve re-run the script and it went okay. the auto-added bits look okay also.

      From the extract it looks as though the last few lines of the function are missing (the line number 167 doesn’t seem to match my script, but if you’ve made changes for your environment, that’s hardly unexpected)


      ‘ < ${alldirs} > ${fulllist}

      #
      # remove the old lines from the conf file.
      # delete every thing from the auto conf heading to the end of the file
      #
      sed -e ‘/### AUTO GENERATED CONFIG ###/,$ d’ < ${SQUIDG_CONF} > ${conftemp}

      # re-generate the config file
      cat ${conftemp} ${fulllist} > ${SQUIDG_CONF}
      rm ${alldirs} ${fulllist} ${conftemp}

      }

      The final “}” is the end of the function

      the only difference I can see is that your snippet is missing the redirection “< " instructions at the end of the sed command. sed -e ' { /###---/ a\ dest auto_bl_complete { /^[^#]/ { s/^\.// s#\(.*\)/\(.*\)s# \2list \1/\2s# } $ a\ } } ' < ${alldirs} > ${fulllist}

      Your sed line agrees with mine, which is why i think it is the “< ${alldirs} > ${fulllist}” bit.

      The command split across several lines for readability, but in one line, logically it is:
      sed -e ‘script’ < input > output

      I’ve also got no white-space after the
      sed -e ‘

      I must admit I’m clutching at straws here – it’s been years since I wrote this, & now it looks like random line noise!

      • Jason says:

        Hello Tim,

        Thank you for your response on the matter, I didn’t realize it has been so long since you created this script :D.

        What I posted above was a snippet of your function and yes unfortunately I had made changes so the line numbers do not line up correctly.

        With the error I was getting the *.db files seemed to have been rebuilt, however, I am uncertain that the were actually updated.

        If I change that particular sed line to the following code, the sed error is gone, but I get


        2016-10-02 12:14:13 [15505] INFO: create new dbfile /var/lib/squidguard/db/blacklists/spyware/urls.db
        2016-10-02 12:14:13 [15505] FATAL: syntax error in configfile /etc/squidguard/squidGuard.conf line 115
        2016-10-02 12:14:13 [15505] ERROR: Going into emergency mode

        My sed line:


        sed -e '{
        /###---/ a\ dest auto_bl_complete { /^[^#]/ { s/^\.// s#\(.*\)/\(.*\)s# \2list \1/\2s# } $ a\ }
        }' ${fulllist}

        I appreciate your help and I know it is removing some dust, but perhaps if I knew what exactly the sed line is trying to accomplish I could then figure out why I am having the issues.

        • Jason says:

          Hi Tim,

          Having reviewed my squidGuard.conf after running the script, the following was added to the bottom of the conf file.


          ### AUTO GENERATED CONFIG ###
          ### generated on Sun Oct 2 12:26:03 ADT 2016
          ###
          ###
          ###---
          dest auto_bl_complete { /^[^#]/ { s/^.// s#(.*)/(.*)s# 2list 1/2s# } $ a }
          /var/lib/squidguard/db/blacklists/ads/domains
          /var/lib/squidguard/db/blacklists/ads/urls
          /var/lib/squidguard/db/blacklists/aggressive/domains
          /var/lib/squidguard/db/blacklists/aggressive/urls
          /var/lib/squidguard/db/blacklists/audio-video/domains
          /var/lib/squidguard/db/blacklists/audio-video/urls
          /var/lib/squidguard/db/blacklists/drugs/domains
          /var/lib/squidguard/db/blacklists/drugs/urls
          /var/lib/squidguard/db/blacklists/gambling/domains
          /var/lib/squidguard/db/blacklists/gambling/urls
          /var/lib/squidguard/db/blacklists/hacking/domains
          /var/lib/squidguard/db/blacklists/hacking/urls
          /var/lib/squidguard/db/blacklists/mail/domains
          /var/lib/squidguard/db/blacklists/mail/urls
          /var/lib/squidguard/db/blacklists/porn/domains
          /var/lib/squidguard/db/blacklists/porn/urls
          /var/lib/squidguard/db/blacklists/proxy/domains
          /var/lib/squidguard/db/blacklists/proxy/urls
          /var/lib/squidguard/db/blacklists/redirector/domains
          /var/lib/squidguard/db/blacklists/redirector/urls
          /var/lib/squidguard/db/blacklists/spyware/domains
          /var/lib/squidguard/db/blacklists/spyware/urls
          /var/lib/squidguard/db/blacklists/suspect/domains
          /var/lib/squidguard/db/blacklists/suspect/urls
          /var/lib/squidguard/db/blacklists/violence/domains
          /var/lib/squidguard/db/blacklists/violence/urls
          /var/lib/squidguard/db/blacklists/warez/domains
          /var/lib/squidguard/db/blacklists/warez/urls

          It is not clear to me what is supposed to be accomplished here.

        • Jason says:

          Sorry for all the comments, but for some reason WordPress is removing the portion where alldirs is before fulllist. I do have it

Hi - Please leave a comment