Near Duplicates
What This Means
Pages that are near duplicate in content based upon the configured similarity threshold (set at 90% by default) using the minhash algorithm. Near duplicate pages can cause cannibalisation issues, crawling and indexing inefficiencies and might be a sign of low quality page content.
What Triggers This Issue
This issue is triggered when pages are near duplicates in content, based on a configured similarity threshold (defaulted to 90%).
How To Fix
Having very similar pages can cause cannibalisation issues and crawling and indexing inefficiencies. Very similar pages should be minimised and high similarity could be a sign of low-quality pages, which haven’t received much love – or just shouldn’t be separate pages in the first place. Analyse the near duplicates, considering importance of the page and scale. Then improve content to make more unique if necessary, or consolidate, block, remove, or leave as they are where appropriate.