Deep Content Inspection
Web Safety is capable of performing deep scanning of web pages for explicit adult word phrases. This is very effective way of blocking as it allows to individually block parts of any web site and does not rely on huge site categorization database.
Deep content inspection scans all downloaded textual pages (HTML, JSON and TEXT) and calculates weight of each page by summing weights of all words found. Commonly used words have zero weights, adult specific phrases have positive weights. The more mature a word is - the more weight it has. If contents of a page result into weight more than maximum configured weight then this page is blocked.
Database of adult phrases is stored in /opt/websafety/var/spool/adult/weighted.conf
. Unfortunately it is not possible to change this file from Admin UI, you must do it manually if further adjustment of weights of each adult phrase is required.
The following screen shot shows deep content inspection rule, configured for a default policy.
By default, deep content inspection is switched on with a maximum weight of text configured at value of 80. To keep amount of memory used during scanning manageable the deep content inspection engine does not scan texts exceeding 2 Mb.
It is also possible to scan HTML links (anchors) within text as well as embedded JavaScripts and CSS contents. By default these types of scans are off but they may be switched on in very strict non adult environments.
Trusted Categories
If certain amount of adult only material is acceptable then it is recommended to switch on the Trusted Categories rule. This means that if a given domain is known to be part of a non blocked category, deep content inspection is skipped for this domain. This proves to be very effective way of decreasing false positives, when for example an article on well known news site is blocked because it contains some adult only words.
The list of trusted categories can be configured in Settings / Trusted Categories as indicated on the following screenshot.
The following table shows default recommended trusted categories.
Category | Trusted |
---|---|
ADVERTISING | yes |
AUCTIONS | yes |
AUTOMOTIVE | yes |
BUSINESS SERVICES | yes |
ECOMMERCE SHOPPING | yes |
EDUCATIONAL INSTITUTIONS | yes |
FINANCIAL INSTITUTIONS | yes |
GOVERNMENT | yes |
HEALTH AND FITNESS | yes |
JOBS EMPLOYMENT | yes |
MOVIES | yes |
MUSIC | yes |
NEWS MEDIA | yes |
NON PROFITS | yes |
POLITICS | yes |
RADIO | yes |
RELIGIOUS | yes |
RESEARCH REFERENCE | yes |
SEXUALITY | yes |
SOCIAL NETWORKING | yes |
SOFTWARE TECHNOLOGY | yes |
SPORTS | yes |
TELEVISION | yes |
TRAVEL | yes |
WEBMAIL | yes |