Collecting publications from a website by creating a website crawler

User Role: Administrator, Explorer Duration: 15 min Objective: To learn when and how to properly create a website crawler

Agenda of the training session

User Manual(s)

Download PDF View Fullscreen Close Fullscreen Page: /

What is the crawl depth of the website crawler?

In practice, the crawl depth is unlimited. The crawl depth is 4 starting from the starting URL chosen in the source configuration. However, every second crawl session starts from a page already created during a previous crawl session, chosen randomly. Iteratively, the Cikisi crawler will always go deeper into the site.
Can I choose the depth of the crawl?

No, because we have opted for a more precise limitation, based on the structure that the URL of the article to be created must have. So you can ask to collect only articles with /en/news in their URL.