Table of Contents:

Back to the main Table of Contents

Path Constraints

Link Excludes

Additional Information


Path constraints

This chapter gives instructions and examples for how to set up the Path Constraints settings. Path constraints are regular expressions. A Regular Expression (Regex) is an API for defining String patterns. Use a regex to search, manipulate, and edit a string in Java. Email validation and passwords are two examples of strings where Regex can define the constraints.

Introduction

Use Path Constraints to instruct the scan to only process parts of a domain. The scan regards URLs that match the pattern as internal on the site, and URLs that do not match are regarded in the same way as external links.

A Path Constraint can be a word or a regular expression. In most cases, users set up Path Constraints to:

  • Restrict the scanner to only recognize parts of a site with a pattern such as ^/en
    This instructs the scan to handle any URL that does not begin with /en (for example, http://foo.com/fr/bar) as an external link. For example, the crawler tests the link but does not follow any links on http://foo.com/fr/bar.

  • Instruct the scan to ignore parts of the site with a pattern. For example !^/fr
    This instructs the scan to handle any URL that begins with /fr (for example http://foo.com/fr/bar) as an external link. This means that the scan tests the link but does not follow any links on http://foo.com/fr/bar.

The difference between the two is that in the first case ALL pages under /en are scanned and nothing else. In the second example, all pages EXCEPT /fr are scanned.

IMPORTANT! Make sure that the URL for the domain is set to a page that matches the constraint.

If this is not done, just one page is scanned, since the scanner cannot proceed to any other page than the page it starts on.

For example. with a constraint of "^/en/booking", starting the crawler on http://foo.com will not work. The crawler will request http://foo.com, receive the page and find that no links match http://foo.com/en/booking, which will result in just the first page being scanned.

Instructions:

  1. From the Monsido Domain Overview (globe icon), click Settings (gear icon) at the top of the page. The Admin Settings page opens.

    Note: The Settings button is only available to site admins.

    Image that shows the location of the Settings button, on the top menu bar.

  2. On the same row as the domain to scan, click Action.

    Image showing the location of the Actions button for a domain, on the right-hand side of the page.

  3. In the drop-down list, select Edit Domain.

    Image of the location of the Edit Domain option in the Actions menu.

    The Edit Domain page opens.

    Image showing the Edit Domain Features section.

  4. Scroll to the Advanced Domain Options section.

    Image of the Advanced Domain Options section.

  5. Scroll down to the Path Constraints section:

    • Search: Enter a search parameter for matching strings within the Constraint Patterns list.

    • Constraint pattern: Enter a constraint pattern.

    • + Add: Click + to add a new Constraint pattern. An empty row is added to the list.

      Note: The window only shows the first five items. With more than five list items, a paginate function begins to sort consecutive list items.

    • Delete: Click the trashcan icon to delete an item from the list.

    For more information and examples, see the external article:


Link excludes

This chapter gives instructions and examples for how to set up the Link Excludes settings. Link Excludes are regular expressions. A Regular Expression (Regex) is an API for defining String patterns. Use a regex to search, manipulate, and edit a string in Java. Email validation and passwords are two examples of strings where Regex can define the constraints.

Introduction

Choose to exclude a word or a regular expression. Use Link Excludes to instruct the crawler to completely ignore a link on the pages. Pages that match the pattern will not be tested.

Use Link excludes to:

  • Filter out print pages with a pattern such as print=true

    This will instruct the scan to ignore (and not test) any URL with the pattern, for example:

    http://foo.com/bar?print=true

  • Filter out redirected login pages with a pattern such as:

    login.aspx?return_url=zyx.

    This will instruct the scan to ignore all URLs with the pattern, for example: http://foo.com/bar/login.aspx?return_url=zyx

Tip! If "Scan subdomains" is turned on for the domain, use the ยง sign in front of the exclude pattern to match URLs that use the full string instead of the relative one. For example. to exclude the scan for the "blog" subdomain, enter:

ยงhttp://blog.foo.bar

as the pattern.

Instructions:

  1. From the Monsido Domain Overview (globe icon), click Settings (gear icon) at the top of the page. The Admin Settings page opens.

    Note: The Settings button is only available to site admins.

    Image that shows the location of the Settings button, on the top menu bar.

  2. On the same row as the domain to scan, click Action.

    Image showing the location of the Actions button for a domain, on the right-hand side of the page.

  3. In the drop-down list, select Edit Domain.

    Image of the location of the Edit Domain option in the Actions menu.

    The Edit Domain page opens.

    Image showing the Edit Domain Features section.

  4. Scroll to the Advanced Domain Options section.

    Image of the Advanced Domain Options section.

  5. In the Link Excludes section:

    • Search: Enter a search parameter for matching strings within the Link excludes list.

    • Exclude pattern: Enter a pattern to exclude from the scan.

      Note: The window only shows the first five items. With more than five list items, a paginate function begins to sort consecutive list items.

    • + Add: Click + to add a new Exclude pattern. An empty row appears in the list.

    • Delete: Click the trashcan icon to delete an item from the list.

    • Internal URLs:

      • Operator: Click the drop-down arrow to select Contains, Starts with, or Regex.

      • Url: Type a URL in the field.

      • Delete: Click the trashcan icon to delete the row.

      • + Add: Click to add a new Input Selector. An empty row appears in the list.

    Important! It is possible to do a link exclusion for a link that is attached to an image. The link is then excluded from the scan. However, the image itself could still appear on the SEO and QA pages as an issue to be fixed if it does not meet other requirements (for example, missing ALT text).

    For more information and examples, see the external article:


Additional information

For more information, see the User Guide chapters:

For advanced instructions on this topic, see the associated article in the Monsido for Developers collection:

For more information and examples, see the external article:

Note: Regular Expressions in source-code exclusions are not 100% compatible with Monsido Policies. The languages are different (Java and Ruby).

For further assistance, contact the Monsido support team at support@monsido.com or via the Monsido chat and help features inside the application.

Image of the dashboard showing the locations of the Help Center buttons.

See Monsido for Developers for documentation containing advanced help files for developers.

Contact us

Monsido, an Optimere brand:

San Diego, CA, USA

5880 Oberlin Dr,
San Diego, CA 92121, USA

Monsido US Support:+1 858-281-2185

Australia & New Zealand

Suite 2.04
80 Cooper St
Surry Hills, NSW 2010

Monsido APAC Support:+61 2 9051 0590

Copenhagen, Denmark

Borupvang 3
2750 Ballerup, Denmark

Monsido Europe Support:+45 89 88 85 05

London, UK

14 New Street
London, EC2M 4HE

Monsido UK Support:+44 20 8138 8450


Did this answer your question?

๐Ÿ˜ž ๐Ÿ˜ ๐Ÿ˜ƒ

Monsido Help Center

Did this answer your question?