Most of the websites have a parent/detail structure with categories containing links to the detail pages.
This structure is very common on e-commerce sites.
On RTILA you can define a project as a bridge from the options page as shown in the image below. This will allow the bridged project to receive the links to scrape from a parent.
Before going ahead, let me show you how it looks on a category page. I have chosen a colorful e-commerce website and this is a random category page:
The watermelon products point to their own detail page as expected.
How we should build the relation between both pages?
The bridge feature will do this task.
Let me give you some advice:
- You will need a non-bridge project to start with.
- A bridge project will be always an intermediate or final step.
Check this short video to get an overall idea, it will take you less than 4 minutes.
The parent project should have at least one LINK element. An HTML A tag should appear at the end of the CSS selector, not identical but similar to this: DIV.product-title > A
Note that the A tag defines a hyperlink which is used to link from one page to another.
You have to feed the bridge with links, creating this association will require a valid URL.
The following image presents the Properties tab settings of the main project.
Once your CSS selector is set, be sure to select the Element that contains the link (in this case there is only one, the purple box labeled with a 4 inside) and click on Advanced.
Set the Type select box from Property to Bridge and choose the detail page project.
Of course, you need to build the detail project first. This is the reason I started on reverse order in the video, setting the detail/bridge project first, and creating the parent project later.
There is no Start Extraction button on the Bridge projects because they require the parent project is executed before.
That’s another reason to start with the detail page, test it, and set as a bridge after checking all works as expected.
You can concatenate as many bridges as you need.
Some sites will require more than one but this will not be a problem at all.
Here you can see a preview with the main project and 5 bridges.
There are 6 projects in total, scraping states, cities (alphabetical pagination), cities (names), services page, listings, and properties.
And this is an aerial view from the site, on some cases, a single page could contain 2 bridges: