Searching with connectors such as and , or , or w/5 between your search terms is called Terms and Connectors Searching . This is a good strategy to employ when you need to get more specific with the language you're searching for. Below are step-by-step instructions as well as some videos to get you started with full text searching.
SUMMARY:
In the following case study, we take a basic search through 5 different stages on the simplicity-complexity scale and see the effect it has on our actual results. This case study follows the exact same example covered in the video above " How much do full text searches really help? " We also advise that you scroll down to view the beginner, intermediate, and advanced search connector tables (or download them as PDFs at the bottom of this article) for full explanations on how connectors function in searches.
CASE STUDY SEARCH, step 1 : categories but no search terms
- We start with a simple search for all Financial Statements from Consumer Products and Industrial Products companies
- In the SEDAR Filings dataset, add Industry and Document Category criteria if they are not already added by clicking on the + Add criteria link on the upper left of your screen
- In Industry add Consumer Products and Industrial Products
- In Document Category add Financial Statements
- In Filing Date add Last 2 years
- Click Search
- You will get 1000 results
- These are lots of Financial Statements but we don’t know if they talk about what we’re interested in: net profit and sales growth
- We can click into each result and search for these terms document by document similar to using Ctrl+F on a SEDAR filing from sedar.com
- This retrieval has no full text search component
CASE STUDY SEARCH, step 2 : adding search terms and using and
- Question: "Do I need to find any words or phrases in these documents?"
- Go back to your search screen and add the keywords net profit and sales growth to your search and click Search again (you can copy and paste the search from here)
- You now have two exact phrases (words with spaces in between them are exact phrases) that must appear in the document ("and" indicates that the two phrases need to be present in the document but it does not specify any relationship between them)
- Note the number of results you retrieve (at the time of filming of this video it was 74 results)
- You will be able to click into each result and automatically see any time that either net profit or else sales growth appear in each filing
- The result of 74 means 926 (92.6%) of the results in your first search were not on point and were essentially noise
i. This simple search has already saved you 90% of the time you’d spend if you didn’t use any full text search but it is still missing the majority of valid results
- This search uses the and connector – and – to specify more than one word or phrase that needs to be found
CASE STUDY SEARCH, step 3 : adding a proximity connector
- Question: "Are any of my phrases too narrow or too specific?"
- Question: "Am I looking for search terms to be included in the same discussion but not necessarily to constitute an exact phrase?"
- Go back to your search and change the search to net profit and (sales w/5 growth) and click Search again
- This allows the phrase sales growth to be replaced by any phrase that includes sales and growth within 5 words of each other
- This is non-directional so you will get sales growth as well as growth in fourth quarter sales , growth in domestic sales, growth in international sales , among other variations
- Note the number of results you retrieve (at the time of filming it was 182 results) – this is more than double the number of results retrieved in step 2
- This search uses the proximity connector – w/n – (where n would be replaced by any number between 1 to 5000, in order to limit the maximum distance you will accept between the word or phrase preceding it and the word or phrase following it) to ensure that terms are related or close to each other
CASE STUDY SEARCH, step 4 : adding a wildcard
- Question: "Do I need any variations on the exact forms of my search terms such as plurals or tenses?"
- Go back to your search and change the search to net profit* and (sales w/5 grow*) and click Search again
- This will allow you to get net profit as well as net profits and also to get grow as well as grows, growing, grown, growth but it will NOT get grew (to get grew you need to add it as a synonym in step 5 below, since it does not begin with the same four letters as grow )
- When adding a wildcard to a root word, cut the word at the last character that occurs in all variations of it
- increase* (with the e before the asterisk) will get increase, increases, increased but it will NOT get increasing
- increas* (with the s before the asterisk) will get increase, increases, increased, and increasing
- Note the number of results you retrieve (at the time of filming it was 327 results) – this is almost 5 times the number of results retrieved in step 2
- You will be able to see every time either the phrase net profit or net profits appear as well as every time that sales appears within 5 words of most forms of growth ( grow, growing, growth, grown ) in any of these documents
- This search uses the wildcard - * - to allow for various endings of words
CASE STUDY SEARCH, step 5 : adding synonyms (using or between them)
- Question: "Can any of my search terms be substituted with a different word that might be used in its place?"
- Go back to your search and change the search to net profit* and (sales w/5 grow* or increas*) and click Search again
- We can see that net profit is a term of art that doesn't have synonyms, and the word sales doesn't have any synonyms we would accept in this context
- The same is not true for grow , which could be replaced in a sentence by increase, augment, ramp up - even the non-synonym double would get you relevant results ("doubling sales" might be as valid to you as "growing sales")
- Take a moment and think of not only literal synonyms, but any other terms you would accept in the same place, even if they are not literal synonyms
- Note the number of results you retrieve (at the time of filming it was 860 results) – this is more than 11 times the number of results retrieved in step 2
- You will be able to see every time either the word net profit or net profits appears as well as every time that sales appears either within 5 words of most forms of growth ( grow, growing, growth, grown , etc.) or else within 5 words of any form of increase (increases, increased, increasing , etc.) in any of these documents
- This search uses 2 synonyms, separated by an – or –, to allow for various words in the place of “grow”
- We are only using one synonym above ( increas* ) , but a more complete search might be something like net profit* and (sales w/5 grow* or grew or increas* or improv* or augment*) - this would get even more results
CASE STUDY, CONCLUSIONS:
- Even the most basic search, the s earch in step 2 above, cuts out 90% of the noise you would have to wade through on SEDAR or any method relying on Ctrl+F to find words in documents
- The most specific search, the s earch in step 5 above, uses (1) a proximity connector , (2) wildcards , and (3) synonyms (separated by an or ), finds more than 11 times as many relevant documents as the basic search, the s earch in step 2 above, does
- By sorting results by rank you will start with the most relevant results (those that have the search terms most frequently occurring and most tightly clustered together) first so you don’t have to look all of the 860 results but can just look at the highest/best matches among them
1. Intro Level : Make sure you are at least familiar with the first 5 connectors in the chart below ([space], AND, OR, AND NOT, *)
a. Look at the 6 th connector (w/n) and consider whether or not it would be useful for you to use in your searching. If the answer is no, you will not need to know more about terms and connector searching than beginner level
2. Basic Level : Look at the entire list of connectors in the Basic Terms and Connectors Searching chart below
a. If you find you have no unanswered questions and you are not interested in knowing more, you will not need more than basic level
3. Intermediate and Advanced Levels : Look at the second chart below – Intermediate and Advanced Terms and Connectors Searching
a. The first 3 examples are intermediate and the last 3 are advanced applications of Terms and Connectors Searching
i. Understanding and employing intermediate and advanced terms and connectors searching gives you a lot more power over what you look at and allows you to cut out a lot of noise in your searching
Connector | Example | Retrieves | Highlights |
[space] | region of incorporation | Documents that contain the exact same phrase searched for
EXCEPTION: some phrases do require quotation marks in order to be recognized.
IMPORTANT: see “” (quotes) connector below
| The exact phrase region of incorporation |
AND | warrant AND consideration | Documents that contain both terms in them | Both terms anywhere in the document, regardless of proximity to each other
|
OR | warrant OR consideration | Documents that contain either term OR both terms in them | Either term anywhere in the document
|
AND NOT | warrant AND NOT consideration | Documents that contain one term but must not contain the other
| Only the term warrant and must not contain the term consideration |
* | warrant* | Documents that contain any term that begins with a specified string of characters
| Any term that starts with " warrant " including warrant s, warrant ed, warrant y, warrant ies, etc.
|
w/n | warrant w/10 consideration | Documents that contain one term within a certain number of words of the other term
Allows for any combination of words in between these two terms so it is not looking for any exact phrase in particular.
It is looking for terms that form part of an idea, conversation, or topic. | Either term whenever it appears within a certain number of words of the other term |
pre/n | warrant pre/10 consideration | Documents that contain one term preceding the other term by a certain number of words (or less than that number of words) | Both warrant and consideration so long as warrant precedes consideration by within 10 words or less.
If warrant is 11 words before consideration , neither term will be highlighted
|
NOT w/n | warrant NOT w/10 consideration | Documents that have at least one instance of a term appearing in them without that term being within a certain distance of another specified term | Warrant whenever it is not within 10 words of consideration .
Warrant may also appear within 10 words of consideration in this document but this instance will not be highlighted
|
xfirstword | warrant w/10 xfirstword | Specifies the location of the first word appearing in the document.
When combined with w/n, finds documents that have a term appearing within a certain number of words of the first word in the document
| Every instance of " warrant " that appears within 10 words of the first word in the document. |
"" (quotes) | "warranties and representations"
| Documents that have the exact phrase that was searched for, including recognizing and , or , and not as normal terms and not as connectors | Unnecessary for most phrase searching.
Only necessary when the exact phrase contains a word that is normally a connector such as "and", "or", "not"
Any time and , or , and not are enclosed within “” they will be treated as regular terms to be searched for and will cease being connectors in that phrase |
% | wa%rrant | Documents that have words that are somewhat similar to warrant | Will find misspellings of warrant such as warant and warrrant
|
Level | Search | Retrieves | Highlights |
Intermediate | (warrant and consideration) or common shares | Documents that:
1) Have both warrant and consideration
2) But don't necessarily contain common shares
OR ELSE documents that:
1) Contain common shares
2) But don't necessarily contain either warrant or consideration
| Highlights any occurrences of warrant , consideration , or common shares that are found in the relationships specified in the search
Warrant will only be highlighted if the term consideration is in the document |
Intermediate | warrant and (consideration or common shares) | Documents that:
1) Contain warrant
2) And also contain EITHER consideration or common shares
| Highlights any occurrences of warrant , consideration , or common shares that are found in the relationships specified in the search |
Intermediate
Intermediate (continued) | (warrant and consideration) w/10 common shares | Documents that:
1) Contain warrant within 10 words of common shares
2) AS LONG AS the same document ALSO contains consideration within 10 words of common shares | For warrant to be highlighted it must be:
* within 10 words of common shares , and
* consideration must ALSO be within 10 words of common shares or else warrant will not be highlighted
* it does not matter how far apart warrant and consideration are, although they cannot logically be more than 20 words away from each other given that each term is limited to within 10 words of common shares
For consideration to be highlighted it must be:
* within 10 words of common shares , and
* warrant must ALSO be within 10 words of common shares or else consideration will not be highlighted
* it does not matter how far apart warrant and consideration are, although they cannot logically be more than 20 words away from each other given that each term is limited to within 10 words of common shares
For common shares to be highlighted it must be:
* within 10 words of warrant , as well as be
* within 10 words of consideration or else common shares will not be highlighted
* it does not matter how far apart warrant and consideration are, although they cannot logically be more than 20 words away from each other given that each term is limited to within 10 words of common shares
|
|
|
|
|
|
|
|
|
Advanced | common shares w/20 warrant w/10 consideration | Documents that
1) Contain warrant within 20 words of common shares
2) Contain warrant within 10 words of consideration
3) ALSO contain consideration within 10 words of common shares
In this search, common shares , the first term typed, is an anchor term and all proximity connectors that follow in that string apply as a distance from this anchor term common shares .
There is a second stipulation that warrant needs to also be within 10 words of consideration in addition to being within 10 words of the anchor term common shares . | To be highlighted –
1) Common shares must be:
* within 20 words of warrant, as well as be
* within 10 words of consideration
2) Warrant must be:
* within 20 words of anchor term common shares, as well as be
* within 10 words of consideration
3) Consideration must be:
* within 10 words of the anchor term common shares, as well as be
* within 10 words of warrant
|
Advanced | common shares w/20 (warrant w/10 consideration) | Documents that
1) Contain warrant within 10 words of consideration
2) Contain warrant OR ELSE consideration within 20 words of common shares
3) consideration can be any distance from common shares so long as the two above conditions are met
| To be highlighted –
1) Common shares must be:
* within 20 words of warrant, OR ELSE be
* within 20 words of consideration
2) Warrant must be:
* within 10 words of anchor term consideration, as well as be
* within 20 words of common shares ONLY IF consideration is not within 20 words of common shares
3) Consideration must be:
* within 10 words of warrant, as well as be
* within 20 words of the anchor term common shares ONLY IF warrant is not within 20 words of common shares 4) only warrant OR consideration needs to be within 20 words of the anchor term common shares
|
Advanced | Common shares w/10 warrant w/15 consideration w/20 collectively | Documents that contain:
1) Warrant within 10 words of common shares
2) Consideration within 15 words of common shares
3) Collectively within 20 words of common shares
Due to the string of consecutive proximity connectors (not broken by an AND, OR, or NOT), any returned document will also need to contain:
1) Warrant within 15 words of consideration
2) Consideration within 20 words of collectively
In this search, common shares , the first term typed, is an anchor term and all proximity connectors that follow in that string apply as a distance specified relative to this anchor term common shares.
There is a second stipulation that each term ALSO needs to be within a specified proximity to the term following it, based on the proximity connector used (w/10, w/15, w20) | To be highlighted –
1) Common shares must be:
* within 10 words of warrant, as well as be
* within 15 words of consideration, and also be
* within 20 words of collectively
2) Warrant must be:
* within 10 words of anchor term common shares, as well as be
* within 15 words of consideration
3) Consideration must be:
* within 15 words of the anchor term common shares, as well as be
* within 15 words of warrant, and also be
* within 20 words of collectively
4) Collectively must be:
* within 20 words of the anchor term common shares, as well as be
* within 15 words of consideration |