Issue #1042
Apple’s Vision framework has quietly grown into one of the most capable on-device ML libraries available on Apple platforms. Starting in iOS 17, it gained the ability to separate the foreground subject from the background of a photo, no server calls or third-party models required.
The entry point is VNGenerateForegroundInstanceMaskRequest. It produces a pixel mask identifying which parts of the image belong to the foreground. From that mask you can composite the original image against a transparent background, leaving only the subject.
How the mask is generated
The request runs through a VNImageRequestHandler, which takes a CIImage as input. After processing, you retrieve the mask from the request’s results and scale it to match the original image’s dimensions.
func makeForegroundMask(from ciImage: CIImage) throws -> CIImage {
let handler = VNImageRequestHandler(ciImage: ciImage)
let request = VNGenerateForegroundInstanceMaskRequest()
try handler.perform([request])
guard let result = request.results?.first else {
throw BackgroundRemovalError.noResult
}
let maskPixelBuffer = try result.generateScaledMaskForImage(
forInstances: result.allInstances,
from: handler
)
return CIImage(cvPixelBuffer: maskPixelBuffer)
}
The allInstances property selects every detected foreground subject. If you only want a specific object, you can inspect the results and pass a subset instead.
Applying the mask
With the mask in hand, compositing is a one-liner using Core Image’s blend filter. The trick is to blend the original image against a fully transparent image using the mask as the alpha channel.
func applyMask(_ mask: CIImage, to image: CIImage) -> CIImage {
let filter = CIFilter.blendWithMask()
filter.inputImage = image
filter.maskImage = mask
filter.backgroundImage = CIImage.empty()
return filter.outputImage ?? image
}
CIImage.empty() produces a transparent placeholder. Pixels where the mask is white become fully opaque; pixels where it is black become transparent.
Converting to a displayable image
CIImage does not render on its own. You need a CIContext to produce a CGImage, and from there a UIImage or NSImage.
func render(_ ciImage: CIImage) -> UIImage? {
let context = CIContext()
guard let cgImage = context.createCGImage(ciImage, from: ciImage.extent) else {
return nil
}
return UIImage(cgImage: cgImage)
}
Creating a CIContext is expensive, so hold one as a property on your view model or service rather than allocating it per call.
Putting it together in SwiftUI
A minimal view model keeps the pipeline clean and keeps the view free of Vision imports.
@Observable
class SubjectExtractor {
var result: UIImage?
private let context = CIContext()
func extract(from image: UIImage) async {
guard let ciInput = CIImage(image: image) else { return }
do {
let mask = try makeForegroundMask(from: ciInput)
let masked = applyMask(mask, to: ciInput)
result = render(masked)
} catch {
print("Extraction failed: \(error)")
}
}
}
In the view, call extractor.extract(from:) inside a Task triggered by a button tap or on image selection. Display extractor.result when it is non-nil.
Start the conversation