Items provide the container of scraped data, while Item Loaders provide the mechanism for populating that container.
# Using Item Loaders to populate items
To use an Item Loader, you must first instantiate it. You can either instantiate it with a dict-like object (e.g. Item or dict) or without one, in which case an Item is automatically instantiated in the Item Loader constructor using the Item class specified in the ItemLoader.default_item_class attribute
.
Then, you start collecting values into the Item Loader, typically using Selectors. You can add more than one value to the same item field; the Item Loader will know how to “join” those values later using a proper processing function.
Here is a typical Item Loader usage in a Spider, using the Product item declared in the Items chapter:
from scrapy.loader import ItemLoader | |
from myproject.items import Product | |
def parse(self, response): | |
l = ItemLoader(item=Product(), response=response) | |
l.add_xpath('name', '//div[@class="product_name"]') | |
l.add_xpath('name', '//div[@class="product_title"]') | |
l.add_xpath('price', '//p[@id="price"]') | |
l.add_css('stock', 'p#stock]') | |
l.add_value('last_updated', 'today') # you can also use literal values | |
return l.load_item() |
the ItemLoader.load_item()
method is called which actually returns the item populated with the data previously extracted and collected with the add_xpath()
, add_css()
, and add_value()
calls.
# Input and Output processors
An Item Loader contains one input processor and one output processor for each (item) field. The input processor processes the extracted data as soon as it’s received (through the a dd_xpath()
, add_css()
or add_value()
methods) and the result of the input processor is collected and kept inside the ItemLoader. After collecting all data, the ItemLoader.load_item()
method is called to populate and get the populated Item object. That’s when the output processor is called with the data previously collected (and processed using the input processor). The result of the output processor is the final value that gets assigned to the item.