SwiftSoup is a Swift implementation of the popular Java library, Jsoup. It provides a convenient API for parsing HTML and manipulating elements using familiar CSS-style selectors. With SwiftSoup, you can easily extract data from HTML documents, scrape web pages, or scrape data from APIs that return HTML responses. This library is perfect for building web scrapers, data extraction tools, or any application that requires HTML parsing or manipulation.
Installation
Option 1: Install via CocoaPods
If you’re using CocoaPods for managing dependencies in your project, add the following line to your Podfile
:
pod 'SwiftSoup'
Option 2: Install via Swift Package Manager (SPM)
SwiftSoup can also be added as a dependency through the Swift Package Manager. Add the package to your Package.swift
file:
.package(url: "https://github.com/scinfu/SwiftSoup.git", .upToNextMajor(from: "2.0.0"))
Usage
Import SwiftSoup
import SwiftSoup
Parsing HTML
To parse an HTML string or document, use the SwiftSoup.parse()
method:
do {
let html = "Example Hello, World!
"
let doc = try SwiftSoup.parse(html)
// Manipulate or extract data from the document
} catch Exception.Error(let type, let message) {
print("Error: \(message)")
} catch {
print("Unknown error")
}
Selecting Elements
SwiftSoup provides CSS-style selector queries to select specific elements in the parsed HTML document. Use the select()
method to retrieve a list of matching elements:
// Select all div elements with the class 'container'
let divElements = try doc.select("div.container")
// Iterate over the selected elements
for div in divElements {
print(try div.text())
}
Modifying Elements
Once you have selected an element, you can easily modify its content or attributes:
// Set the text content of an element
try div.text("New Content")
// Add an attribute to an element
try div.attr("data-id", "123")
// Remove an attribute from an element
try div.removeAttr("class")
Extracting Data
If you need to extract specific data from the HTML document, you can retrieve text, HTML, or attribute values from the selected elements:
// Get the text content of an element
let text = try div.text()
// Get the HTML content of an element
let html = try div.html()
// Get the value of an attribute
let dataId = try div.attr("data-id")
Complete Example
Here’s a complete example that demonstrates parsing an HTML page, selecting and modifying elements, and extracting data:
do {
let html = "Example Hello, World!
"
let doc = try SwiftSoup.parse(html)
// Select all div elements with the class 'container'
let divElements = try doc.select("div.container")
// Print the text content of each div
for div in divElements {
print(try div.text())
}
// Modify an element
let firstDiv = divElements.first()
try firstDiv?.text("New Content")
// Extract data
let newText = try firstDiv?.text()
let htmlContent = try firstDiv?.html()
} catch Exception.Error(let type, let message) {
print("Error: \(message)")
} catch {
print("Unknown error")
}
Conclusion
With SwiftSoup, parsing and manipulating HTML documents becomes a breeze. Whether you need to scrape data from web pages or extract information from HTML responses, this library provides a powerful and intuitive API. Start using SwiftSoup today and simplify your HTML parsing tasks in Swift projects.
Resources
- SwiftSoup on GitHub: https://github.com/scinfu/SwiftSoup