How to remove image background in Swift

Issue #1042

Apple’s Vision framework has quietly grown into one of the most capable on-device ML libraries available on Apple platforms. Starting in iOS 17, it gained the ability to separate the foreground subject from the background of a photo, no server calls or third-party models required.

The entry point is VNGenerateForegroundInstanceMaskRequest. It produces a pixel mask identifying which parts of the image belong to the foreground. From that mask you can composite the original image against a transparent background, leaving only the subject.

How the mask is generated

The request runs through a VNImageRequestHandler, which takes a CIImage as input. After processing, you retrieve the mask from the request’s results and scale it to match the original image’s dimensions.

func makeForegroundMask(from ciImage: CIImage) throws -> CIImage {
    let handler = VNImageRequestHandler(ciImage: ciImage)
    let request = VNGenerateForegroundInstanceMaskRequest()
    try handler.perform([request])

    guard let result = request.results?.first else {
        throw BackgroundRemovalError.noResult
    }

    let maskPixelBuffer = try result.generateScaledMaskForImage(
        forInstances: result.allInstances,
        from: handler
    )
    return CIImage(cvPixelBuffer: maskPixelBuffer)
}

The allInstances property selects every detected foreground subject. If you only want a specific object, you can inspect the results and pass a subset instead.

Applying the mask

With the mask in hand, compositing is a one-liner using Core Image’s blend filter. The trick is to blend the original image against a fully transparent image using the mask as the alpha channel.

func applyMask(_ mask: CIImage, to image: CIImage) -> CIImage {
    let filter = CIFilter.blendWithMask()
    filter.inputImage = image
    filter.maskImage = mask
    filter.backgroundImage = CIImage.empty()
    return filter.outputImage ?? image
}

CIImage.empty() produces a transparent placeholder. Pixels where the mask is white become fully opaque; pixels where it is black become transparent.

Converting to a displayable image

CIImage does not render on its own. You need a CIContext to produce a CGImage, and from there a UIImage or NSImage.

func render(_ ciImage: CIImage) -> UIImage? {
    let context = CIContext()
    guard let cgImage = context.createCGImage(ciImage, from: ciImage.extent) else {
        return nil
    }
    return UIImage(cgImage: cgImage)
}

Creating a CIContext is expensive, so hold one as a property on your view model or service rather than allocating it per call.

Putting it together in SwiftUI

A minimal view model keeps the pipeline clean and keeps the view free of Vision imports.

@Observable
class SubjectExtractor {
    var result: UIImage?

    private let context = CIContext()

    func extract(from image: UIImage) async {
        guard let ciInput = CIImage(image: image) else { return }

        do {
            let mask = try makeForegroundMask(from: ciInput)
            let masked = applyMask(mask, to: ciInput)
            result = render(masked)
        } catch {
            print("Extraction failed: \(error)")
        }
    }
}

In the view, call extractor.extract(from:) inside a Task triggered by a button tap or on image selection. Display extractor.result when it is non-nil.