Web

This guide will help you configure your Masa Node as a web scraper.

Prerequisites

A running, staked Masa Node (see Binary Installation
Basic understanding of web scraping concepts

Configuration Process

Set environment variable

Enable web scraping in your .env file:
```
WEB_SCRAPER=true
```
Restart your node

Restart the Masa node to apply the changes.

Verify configuration

Check the logs for confirmation:

#######################################
#     __  __    _    ____    _        #
#    |  \/  |  / \  / ___|  / \       #
#    | |\/| | / _ \ \___ \ / _ \      #
#    | |  | |/ ___ \ ___) / ___ \     #
#    |_|  |_/_/   \_\____/_/   \_\    #
#                                     #
#######################################

Is TwitterScraper:   True
Is DiscordScraper:   false
Is TelegramScraper:  false
Is WebScraper:  True

Test the web scraper

Curl the node in local mode to confirm it returns web data:

curl -X 'POST' \
  'http://localhost:8080/api/v1/data/web' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "url": "https://google.com",
  "depth": 1
}'

You should receive a response with scraped web data.

Security Considerations

Respect robots.txt files and website terms of service.
Implement rate limiting to avoid overloading target websites.
Be cautious with handling potentially sensitive scraped data.

Warning: Cloud-Based Scraping

If you are running a web scraper in the cloud, consider using a residential proxy. Some websites may block or limit access from cloud IP ranges. Ensure you have a reliable residential proxy service set up before deploying your scraper in a cloud environment.

Troubleshooting

If you encounter issues:

Check your node’s network connectivity.
Verify the target website is accessible and allows scraping.
Review node logs for any error messages related to web scraping.
If running in the cloud, confirm your proxy (if used) is correctly configured.

For more detailed setup options and advanced configurations, refer to:

Get Started (Pre-requisites)

Subnet 42

Subnet 59

Prerequisites

Configuration Process

Security Considerations

Warning: Cloud-Based Scraping

Troubleshooting

Get Started (Pre-requisites)

Subnet 42

Subnet 59

​Prerequisites

​Configuration Process

​Security Considerations

​Warning: Cloud-Based Scraping

​Troubleshooting

Prerequisites

Configuration Process

Security Considerations

Warning: Cloud-Based Scraping

Troubleshooting