This article is about How to do simple ruby web scraping by processing CSV.
In this article, we will create a Ruby on Rails application to scrap the link uploaded from a CSV file and find the occurrence of the link on a particular page.
In the CSV file, there will be 2 columns
Let’s start creating a Rails Application
- Run the below command to create a new rails application
$ rails new scrape_csv_data
$ cd scrape_csv_data
- Then, we will generate an Upload CSV module. Run the below command.
$ rails g scaffold UploadCsv generated_csv:string csv_file:string
This will create all the required models, controllers, and migrations for csv_file. Run the migration using the below command.
$ rails db:migrate
gem 'carrierwave', '~> 2.0'
$ bundle install
- Then we will create the uploader in careerwave using the below command.
$ rails generate uploader Avatar
- We will attach the uploader in the model app/models/upload_csv.rb.
class UploadCsv < ApplicationRecord
mount_uploader :csv_file, AvatarUploader
- Then, we will start the server and check if the application is working successfully or not.
$ rails s
- Then we will create a job to read the CSV file and scrap the link from it and the generated file will be stored in the generated_csv column of that record for generating the job. Run the below command.
$ rails generate job genrate_csv
- Add the below gem and run bundle install
- Then we will replace the below code in the GenerateCsv job.
- Then we will run the job after_create of upload_csvs and we will add the validation for the csv_file required.
- Now update the code of app/models/upload_csv.rb.
After uploading the file check the scrap generated file will be updated. You can check the generated file in /scrape_data/public/result_data.csv
- Now we will send the generated file through email by using the below instructions.
First, we will generate the mailer by using the below command.
$ rails generate mailer NotificationMailer
code inside the notification mailer.
Also, we need to add mail configuration
inside config/environments/development.rb or production.rb.
Also, we need to update the view also app/views/notification_mailer/send_csv.html.erb