Card Scanner in SwiftUI

18 min readFeb 5, 2023

The CardScanner view can be configured using the Configuration struct, which allows for customization of the watermark text, font, accent color, and the "Cancel" and "Done" button text. The default values for these properties are provided as a static constant default.

CardScanner is UIViewController represantable that wrappes UIKit implementation of CardScannerController.

public struct CardScanner: UIViewControllerRepresentable {

    public struct Configuration { ... }

    // MARK: - Environment
    @Environment(\.presentationMode) var presentationMode
    
    private let firstNameSuggestion: String
    private let lastNameSuggestion: String
    private let configuration: Configuration
    
    // MARK: - Actions
    let onCardScanned: CardScannerHandler
    
    public init(
        firstNameSuggestion: String = "",
        lastNameSuggestion: String = "",
        configuration: Configuration = .default,
        onCardScanned: @escaping CardScannerHandler = { _, _, _ in }
    ) { ... }

    public func makeCoordinator() -> Coordinator {
        Coordinator(self)
    }
   
    public func makeUIViewController(context: Context) -> CardScannerController {
        let scanner = CardScannerController(configuration: configuration)
        scanner.firstNameSuggestion = firstNameSuggestion
        scanner.lastNameSuggestion = lastNameSuggestion
        scanner.delegate = context.coordinator
        return scanner
    }
    
    public func updateUIViewController(_ uiViewController: CardScannerController, context: Context) { }
}

CardScanner.Coordinator is used to handle CardScannerDelegatecallbacks: didTapCancel, didTapDone, didScanCard.

1. CardScannerController

The CardScannerController class provides the functionality of scanning a credit card using the device's camera. It has properties for the label views that display the credit card information and a button for canceling or saving the scan operation. It also has methods for setting up the views, analyzing the observations, and updating the credit card information.

The class is a subclass of VisionController and provides the functionality of scanning a credit card using the device's camera.

CardScannerController override property overlayViewClass and provides custom CardOverlayView that will be used to provide overlay on camera preview for purpose of card scanning.

CardScannerController defines several controls and implements setup code to layout this views. cardNumberLabel, brandLabel, expDateLabel, cardHolderLabel: These are instances of UILabel that are used to display the credit card information (card number, brand, expiration date, and card holder name). button: This is an instance of UIButton that is used to cancel the scan operation or to dismiss view after obtaining the scanned credit card information. This views are setup from viewDidLoad()method using setupLabels() and setupButton() methods.

CardScannerController can receive additional configuration for its customization via CardScanner.Configuration object passed to initializer.

The most important part of CardScannerController is observationsHandler(observations:). This method is called whenever new observations are made by the camera. It analyzes the observations and updates the credit card information.

public override func observationsHandler(observations: [VNRecognizedTextObservation] ) {
        
        var numbers = [StringRecognition]()
        var expDates = [StringRecognition]()
        
        // Create a full transcript to run analysis on.
        var text : String = ""
        
        if observationsCount == 20 && (foundNumber == nil) && cameraBrightness < 0 {
            // toggleTorch(on: true)
        }
        
        let maximumCandidates = 1
        for observation in observations {
            
            guard let candidate = observation.topCandidates(maximumCandidates).first else { continue }
            print("[Text recognition] ", candidate.string)
            
            if foundNumber == nil, let cardNumber = candidate.string.checkCardNumber() {
                let box = observation.boundingBox
                numbers.append((cardNumber, box))
            }
            if foundExpDate == nil, let expDate = candidate.string.extractExpDate() {
                let box = boundingBox(of: expDate, in: candidate)
                expDates.append((expDate, box))
            }
            
            text += candidate.string + " "

            highlightBox(observation.boundingBox, color: UIColor.white)
        }
        
        if foundNumber == nil, let cardNumber = text.extractCardNumber() {
            numbers.append((cardNumber, nil))
        }
       
        searchCardNumber(numbers)
        searchExpDate(expDates)
        searchCardHolder(text)
        
        shouldStopScanner()
    }

The code is analyzing the observations from text recognition to extract credit card information. The function observationHandler takes an array of VNRecognizedTextObservationas input and runs analysis on it to extract credit card information such as the card number, expiry date, and cardholder’s name. The extracted information is stored in variables foundNumber, foundExpDate, and foundCardHolder. The function checkCardNumber is used to check if the observation is a valid card number, extractExpDate to extract the expiry date, and extractCardHolder2 to extract the cardholder’s name. The function searchCardNumber, searchExpDate, and searchCardHolder are used to log the frames and extract the stable values of the card number, expiry date, and cardholder’s name respectively. If the stable value is found, the function showString is used to display the information on the UI. The variable observationsCount is used to keep track of the number of observations and if it exceeds 50 or the card number, expiry date, and cardholder’s name are found, the live stream is stopped.

2. VisionController

The VisionController class is used for detecting text in real-time video streams and recognizing it for further processing.

The class defines various configurations such as language correction, recognition level, and minimum text height.

It also creates a PreviewView that is used to display the live video stream from the camera.
The torch button provides the option to toggle the torch on and off.

func toggleTorch(on: Bool) {
        guard
            let device = AVCaptureDevice.default(for: AVMediaType.video),
            device.hasTorch
        else { return }

        do {
            try device.lockForConfiguration()
            device.torchMode = on ? .on : .off
            torchButton.isSelected = on
            device.unlockForConfiguration()
        } catch {
            print("Torch could not be used")
        }
    }

The device orientation is also handled to ensure the preview view is displayed correctly in different orientations.

 // MARK: - Device Orientation Handling
    public override func viewWillTransition(to size: CGSize, with coordinator: UIViewControllerTransitionCoordinator) {
        super.viewWillTransition(to: size, with: coordinator)
        
        print("[Vision Controller] Orientation did change")

        overlayView.isHidden = true
        DispatchQueue.main.asyncAfter(deadline: .now() + .milliseconds(500)) { [weak self] in
            self?.setupOrientation()
            self?.overlayView.isHidden = false
        }
    }
    
    private func setupOrientation() {
        
        // Only change the current orientation if the new one is landscape or
        // portrait. You can't really do anything about flat or unknown.
        let deviceOrientation = DeviceFeatures.orientation
        overlayView.currentOrientation = deviceOrientation
        
        // Handle device orientation in the preview layer.
        if let connection = previewView.previewLayer.connection {
            if let newOrientation = AVCaptureVideoOrientation(deviceOrientation: deviceOrientation) {
                connection.videoOrientation = newOrientation
            }
        }
    }

2.1 Setup camera preview

The setupLiveStream function is called from viewWillAppear in order to setup the live camera previews and it starts the capture session in a serial dispatch queue to prevent blocking the main thread. The function setupCaptureSession is called within the dispatch queue to remove previous inputs and outputs, set up the capture device, and redirect the stream from the camera to the pixel buffer (screen).

 private func setupLiveStream() {
        
        previewView.session = session
        
        // Starting the capture session is a blocking call. Perform setup using
        // a dedicated serial dispatch queue to prevent blocking the main thread.
        sessionQueue.async { [weak self] in
            self?.setupCaptureSession()
            
            // Calculate region of interest now that the camera is setup.
            DispatchQueue.main.async { [weak self] in
                
                // Figure out initial orientation
                self?.setupOrientation()
            }
        }
    }

private func setupCaptureSession() {
        // remove previous inputs & outputs
        if let inputs = session.inputs as? [AVCaptureDeviceInput] {
            for input in inputs {
                session.removeInput(input)
            }
        }
        if let outputs = session.outputs as? [AVCaptureVideoDataOutput] {
            for output in outputs {
                session.removeOutput(output)
            }
        }
        
        // redirect stream from camera to pixel buffer (screen)
        setupCaptureDevice()
        configCaptureDeviceInput()
        configVideoDataOutput()
 }

The startLiveStream function starts the session running.

func startLiveStream() {
        sessionQueue.async { [weak self] in
            self?.session.startRunning()
        }
    }

The setupCaptureDevice function sets up the default back camera and the session preset, which determines the resolution of the video stream.

 private func setupCaptureDevice() {
        
        guard let device = AVCaptureDevice.default(.builtInWideAngleCamera, for: .video, position: .back) else {
            print("Could not create capture device.")
            return
        }
        self.device = device
        
        // NOTE:
        // Requesting 4k buffers allows recognition of smaller text but will
        // consume more power. Use the smallest buffer size necessary to keep
        // down battery usage.
        if device.supportsSessionPreset(.hd4K3840x2160) {
            session.sessionPreset = AVCaptureSession.Preset.hd4K3840x2160
            overlayView.bufferAspectRatio = 3_840.0 / 2_160.0
        } else {
            session.sessionPreset = AVCaptureSession.Preset.hd1920x1080
            overlayView.bufferAspectRatio = 1_920.0 / 1_080.0
        }
    }

The configCaptureDeviceZoomAndFocus function allows setting the zoom and autofocus to help focus on smaller text.

// Set zoom and autofocus to help focus on very small text.
    private func configCaptureDeviceZoomAndFocus() {
        
        guard let device = device else { return }
        
        do {
            try device.lockForConfiguration()
            device.videoZoomFactor = 2
            device.autoFocusRangeRestriction = .near
            device.unlockForConfiguration()
        } catch {
            print("Could not set zoom level due to error: \(error)")
            return
        }
    }

The configCaptureDeviceInput function creates a capture device input and adds it to the session.

 private func configCaptureDeviceInput() {
         guard
            let device,
            let input = try? AVCaptureDeviceInput(device: device)
        else {
             print("Could not create device input.")
             return
         }
         if session.canAddInput(input) {
             session.addInput(input)
         }
    }

The configVideoDataOutput function creates a video data output, sets the sample buffer delegate to the VisionController class and adds the output to the session. The video data output uses YUV420 format.

private func configVideoDataOutput() {
        
        let output = videoDataOutput
        output.alwaysDiscardsLateVideoFrames = true
        output.setSampleBufferDelegate(self, queue: videoDataOutputQueue)
        output.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String: kCVPixelFormatType_420YpCbCr8BiPlanarFullRange]
        // [kCVPixelBufferPixelFormatTypeKey as String: kCVPixelFormatType_32BGRA]
        
        if session.canAddOutput(output) {
            session.addOutput(output)
            
            // NOTE:
            // There is a trade-off to be made here. Enabling stabilization will
            // give temporally more stable results and should help the recognizer
            // converge. But if it's enabled the VideoDataOutput buffers don't
            // match what's displayed on screen, which makes drawing bounding
            // boxes very hard. Disable it in this app to allow drawing detected
            // bounding boxes on screen.
            output.connection(with: AVMediaType.video)?.preferredVideoStabilizationMode = videoStabilizationMode
        } else {
            print("Could not add video data output")
            return
        }
    }

There are also functions to set the zoom level, configCaptureDeviceZoom, and the preferred video stabilizationization mode.

func configCaptureDeviceZoom(_ factor: Double) {
        
        guard let device = device else { return }
        
        do {
            try device.lockForConfiguration()
            device.videoZoomFactor = CGFloat(factor)
            device.unlockForConfiguration()
        } catch {
            print("Could not set zoom level due to error: \(error)")
            return
        }
    }

The stopLiveStreamfunction stops a live stream, it performs the following steps:

a) Asynchronously turns off the torch on the main queue
b) Asynchronously stops the camera session on a separate session queue.

@objc open func stopLiveStream() {
        // Stop the camera synchronously to ensure that no further buffers are
        // received. Then update the number view asynchronously.
        DispatchQueue.main.async { [weak self] in
            self?.toggleTorch(on: false)
        }
        
        sessionQueue.async { [weak self] in
            self?.session.stopRunning()
        }
    }

2.2 Text recognition with Vision framework

VisonController implements text recognition using the Vision Framework.

The function setupTextRecognition sets up the text recognition request by creating a VNRecognizeTextRequest object and adding it to the requests array. The VNRecognizeTextRequest is initialized with a completion handler that invokes the function textRecognitionHandler.

private func setupTextRecognition() {
        
        let request = VNRecognizeTextRequest(completionHandler: { [weak self] request, error in
            self?.textRecognitionHandler(request: request, error: error)
        })
        
        request.recognitionLevel = recognitionLevel
        request.revision = VNRecognizeTextRequestRevision1
        request.usesLanguageCorrection = usesLanguageCorrection
        request.minimumTextHeight = minTextHeight
        request.regionOfInterest = overlayView.regionOfInterest
        
        requests.append(request)
    }

The function textRecognitionHandler takes the results of the text recognition request and checks if they are of the type [VNRecognizedTextObservation]. If the results are valid, the function removes all sublayers of the preview view except the first two and invokes the function observationsHandler.

 func textRecognitionHandler(request: VNRequest, error: Error?) {
        
        guard let observations = request.results as? [VNRecognizedTextObservation] else {
            print("The observations are of an unexpected type.")
            return
        }
        
        DispatchQueue.main.async { [weak self] in
            self?.previewView.layer.sublayers?.removeSubrange(2...)
        }
        
        observationsHandler(observations: observations)
    }

The function obvervationsHandler takes the observations of recognized text and iterates over them. For each observation, it gets the top candidate of recognized text and prints it to the console. Then it calls the function highlightBox to highlight the bounding box of the recognized text with a white color.

@objc open func observationsHandler(observations: [VNRecognizedTextObservation] ) {
        
        for observation in observations {
            
            let candidates = observation.topCandidates(1)
            if let recognizedText = candidates.first {
                print("[Text recognition] ", recognizedText.string)
                
                /*
                let range = recognizedText.string.startIndex..<recognizedText.string.endIndex
                if let observation = try? recognizedText.boundingBox(for: range) {
                    DispatchQueue.main.async {
                        self.highlightBox(observation.boundingBox, color: UIColor.green)
                    }
                }*/
            }

            DispatchQueue.main.async { [weak self] in
                self?.highlightBox(observation.boundingBox, color: UIColor.white)
            }
        }
    }

2.3 Implementing AVCaptureVideoDataOutputSampleBufferDelegate

The VisionController class conforms to the AVCaptureVideoDataOutputSampleBufferDelegate protocol. So it will act as the delegate for an AVCaptureVideoDataOutput object, which is responsible for capturing video data from the camera.

In the captureOutput method, the delegate receives a sample buffer that contains video data. This method retrieves the image buffer from the sample buffer, and saves it as a property of the VisionController class. It also calculates the brightness of the image using the getBrightness method.

  func getBrightness(sampleBuffer: CMSampleBuffer) -> Double {
        let rawMetadata = CMCopyDictionaryOfAttachments(
            allocator: nil,
            target: sampleBuffer,
            attachmentMode: CMAttachmentMode(kCMAttachmentMode_ShouldPropagate)
        )
        let metadata = CFDictionaryCreateMutableCopy(nil, 0, rawMetadata) as NSMutableDictionary
        let exifData = metadata.value(forKey: "{Exif}") as? NSMutableDictionary
        let brightnessValue = exifData?[kCGImagePropertyExifBrightnessValue as String] as? Double
        return brightnessValue ?? 0.0
    }

Next, the method creates an VNImageRequestHandler object using the image buffer, the text orientation, and a dictionary of options. The text orientation is determined by the overlayView object, which is an instance of a custom UIView subclass that's responsible for drawing the overlay on top of the camera preview. The options dictionary contains the camera intrinsic matrix, which can be attached to the sample buffer.

Finally, the method updates the region of interest for all requests in the requests array. A VNRecognizeTextRequest object is used to recognize text in the image, while a VNDetectRectanglesRequest object is used to detect rectangles in the image. The requests array holds all requests that will be performed by the VNImageRequestHandler object.

Once the region of interest has been updated, the method performs all requests in the requests array by calling the perform method on the VNImageRequestHandler object. If an error occurs during the process, it will be caught and printed to the console.

extension VisionController: AVCaptureVideoDataOutputSampleBufferDelegate {
    
    public func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
        
        guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
        cameraBrightness = getBrightness(sampleBuffer: sampleBuffer)
        cameraImageBuffer = pixelBuffer
    
        var requestOptions: [VNImageOption : Any] = [:]
        
        if let camData = CMGetAttachment(sampleBuffer, key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, attachmentModeOut: nil) {
            requestOptions = [.cameraIntrinsics: camData]
        }
        
        print("Text orientation: \(overlayView.textOrientation.rawValue)")
       
        let imageRequestHandler = VNImageRequestHandler(
            cvPixelBuffer: pixelBuffer,
            orientation: overlayView.textOrientation,
            options: requestOptions
        )
        
        // Update region of interest
        self.requests.forEach { request in
            if let request = request as? VNRecognizeTextRequest {
                request.regionOfInterest = overlayView.regionOfInterest
            } else if let request = request as? VNDetectRectanglesRequest {
                // request.regionOfInterest = overlayView.regionOfInterest
            }
        }
        
        do {
            try imageRequestHandler.perform(requests)
        } catch {
            print(error)
        }
    }
}

3. PreviewView

The PreviewView is a custom UIView subclass that acts as a container for the AVCaptureVideoPreviewLayer. The main purpose of the PreviewView is to display the video output from the camera in real-time. The PreviewView has a property called previewLayer which is an instance of the AVCaptureVideoPreviewLayer. This property is used to access the layer and set the AVCaptureSession.

class PreviewView: UIView {
    var previewLayer: AVCaptureVideoPreviewLayer {
        guard let layer = layer as? AVCaptureVideoPreviewLayer else {
            fatalError("Expected `AVCaptureVideoPreviewLayer` type for layer. Check PreviewView.layerClass implementation.")
        }
        return layer
    }
    
    var session: AVCaptureSession? {
        get {
            previewLayer.session
        }
        set {
            previewLayer.session = newValue
        }
    }
    
    override class var layerClass: AnyClass {
        AVCaptureVideoPreviewLayer.self
    }
}

The PreviewView has a property called session that is used to get or set the AVCaptureSession. The implementation of the session property is done through the previewLayer property. The layerClass property is also overridden in the PreviewView class to return the AVCaptureVideoPreviewLayer class, which means that the layer of the PreviewView will always be of the AVCaptureVideoPreviewLayer type.

4. ScannerOverlayView

The ScannerOverlayView is a UIView class that displays the region of interest (ROI) over a video preview. The ROI is the portion of the video data output buffer that the recognition should be run on. It has several properties and functions that handle the calculation and display of the ROI, as well as handling changes in device orientation.

class ScannerOverlayView: UIView {
    
    private let configuration: CardScanner.Configuration
    
    // MARK: - Init
    required init(configuration: CardScanner.Configuration) {
        self.configuration = configuration

        super.init(frame: .zero)
        setup()
    }
    
    required init?(coder: NSCoder) {
        fatalError("init(coder:) has not been implemented")
    }
    
    private func setup() {
        backgroundColor = UIColor.gray.withAlphaComponent(0.5)
        layer.mask = maskLayer
    }
    
    // MARK: - Mask Layer
    lazy var maskLayer: CAShapeLayer = {
        let layer = CAShapeLayer()
        layer.backgroundColor = UIColor.clear.cgColor
        layer.fillRule = .evenOdd
        return layer
    }()
    
    // MARK: - Region of interest (ROI) - Static!
    var desiredHeightRatio: Double { 0.5 }
    var desiredWidthRatio: Double { 0.6 }
    var maxPortraitWidth: Double { 0.8 }
    var minLandscapeHeightRatio: Double { 0.6 }
    
    // Region of video data output buffer that recognition should be run on.
    // Gets recalculated once the bounds of the preview layer are known.
    var regionOfInterest = CGRect(x: 0, y: 0, width: 1, height: 1)
    // Orientation of text to search for in the region of interest.
    var textOrientation = CGImagePropertyOrientation.up
    
    // MARK: - Coordinate transforms
    var uiRotationTransform = CGAffineTransform.identity
    // Transform bottom-left coordinates to top-left.
    var bottomToTopTransform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -1)
    // Transform coordinates in ROI to global coordinates (still normalized).
    var roiToGlobalTransform = CGAffineTransform.identity
    // Vision -> AVF coordinate transform.
    var visionToAVFTransform = CGAffineTransform.identity
    
    var bufferAspectRatio: Double = 1_920.0 / 1_080.0
    
    // MARK: - Device Orientation
    // Device orientation.Updated whenever the orientation changes
    // to a different supported orientation.
    var currentOrientation = UIDeviceOrientation.portrait {
        didSet {
            // update ROI if orientation changes
            updateRegionOfInterest()
        }
    }
    
    // MARK: - Preview View
    var previewView: PreviewView?
}

In the init method, the setup function is called to initialize the background color and set the mask layer. The mask layer is a CAShapeLayer that acts as a cutout for the ROI and is updated whenever the ROI changes.

// IMPORTANT!
    // This function calculates FIXED REGION OF INTEREST based on
    // desired width/height of region of interest and center it on the screen
    // with little adjustment for landscape and portrait
    // To specify dynamic region of interest override this function
    // and calculate region of interest based on other detected rectangle
    // rather than on predefined width/height constant ratios.
    @objc open func calculateRegionOfInterest() {
        
        // In landscape orientation the desired ROI is specified as the ratio of
        // buffer width to height. When the UI is rotated to landscape try to keep the
        // vertical size the same up to a minimum ratio. When the UI is rotated to
        // portrait try to keep the horizontal size the same up to a maximum ratio.
        
        // Figure out size of ROI.
        let size: CGSize
        if currentOrientation.isPortrait || currentOrientation == .unknown {
            size = CGSize(
                width: min(desiredWidthRatio * bufferAspectRatio, maxPortraitWidth),
                height: desiredHeightRatio / bufferAspectRatio
            )
        } else {
            size = CGSize(width: desiredWidthRatio, height: max(desiredHeightRatio, minLandscapeHeightRatio))
        }
        
        // Make it centered.
        regionOfInterest.origin = CGPoint(x: (1 - size.width) / 2, y: (1 - size.height) / 2)
        regionOfInterest.size = size
        
        print("Region of interest: \(regionOfInterest)")
    }

The calculateRegionOfInterest function calculates the fixed ROI based on the desired width and height ratios and centers it on the screen with some adjustments for landscape and portrait orientations. To specify a dynamic ROI, you can override this function and calculate the ROI based on other detected rectangles instead of predefined constant ratios.

func updateRegionOfInterest() {
    
        // calculate new ROI
        calculateRegionOfInterest()
        
        // ROI changed, update transform.
        setupOrientationAndTransform()
        
        // Update the cutout to match the new ROI.
        DispatchQueue.main.async { [weak self] in
            // Wait for the next run cycle before updating the cutout. This
            // ensures that the preview layer already has its new orientation.
            self?.updateCutout()
        }
    }

The updateRegionOfInterest function is called whenever the device orientation changes, which triggers a calculation of the new ROI, an update to the coordinate transforms, and an update to the cutout.

func setupOrientationAndTransform() {
        // Recalculate the affine transform between Vision coordinates and AVF coordinates.
        
        // Compensate for region of interest.
        let roi = regionOfInterest
        roiToGlobalTransform = CGAffineTransform(translationX: roi.origin.x, y: roi.origin.y).scaledBy(x: roi.width, y: roi.height)
        
        // Compensate for orientation (buffers always come in the same orientation).
        switch currentOrientation {
        case .landscapeLeft:
            textOrientation = CGImagePropertyOrientation.up
            uiRotationTransform = CGAffineTransform.identity
        case .landscapeRight:
            textOrientation = CGImagePropertyOrientation.down
            uiRotationTransform = CGAffineTransform(translationX: 1, y: 1).rotated(by: CGFloat.pi)
        case .portraitUpsideDown:
            textOrientation = CGImagePropertyOrientation.left
            uiRotationTransform = CGAffineTransform(translationX: 1, y: 0).rotated(by: CGFloat.pi / 2)
        default: // We default everything else to .portraitUp
            textOrientation = CGImagePropertyOrientation.right
            uiRotationTransform = CGAffineTransform(translationX: 0, y: 1).rotated(by: -CGFloat.pi / 2)
        }
        
        // Full Vision ROI to AVF transform.
        visionToAVFTransform = roiToGlobalTransform.concatenating(bottomToTopTransform).concatenating(uiRotationTransform)
    }

The setupOrientationAndTransform function is responsible for recalculating the affine transform between Vision coordinates and AVFoundation coordinates.

 @objc func updateCutout() {
        
        // Figure out where the cutout ends up in layer coordinates.
        let roiRectTransform = bottomToTopTransform.concatenating(uiRotationTransform)
        let transformedRoi = regionOfInterest.applying(roiRectTransform)
        guard let cutout = previewView?.previewLayer.layerRectConverted(fromMetadataOutputRect: transformedRoi) else { return }
        
        // Create the mask.
        let path = UIBezierPath(rect: frame)
        path.append(UIBezierPath(roundedRect: cutout, cornerRadius: 10))
        maskLayer.path = path.cgPath
        
        layer.sublayers?.removeAll()
        addOverlays(cutout)
    }

The updateCutout method updates the cutout, which is a region of interest (ROI) on the camera view, and masks the area outside of it. The method transforms the ROI from the bottom to the top of the view and rotates it according to the current orientation of the device. It then converts the transformed ROI to the coordinate system of the preview layer and creates a path for the mask layer that includes the cutout and the frame of the view.

@objc open func addOverlays(_ cutout: CGRect) {
        
        addRoundedRectangle(around: cutout)
        addWatermark()
        
        // override to add additional layers on overlay
    }

The addOverlays method adds overlays to the camera view. It calls two methods, addRoundedRectangle and addWatermark, which add a rounded rectangle around the cutout and a watermark to the camera view, respectively. The method can be overridden to add additional layers on the overlay.

Overall, the ScannerOverlayView is an important class that handles the display of the ROI and the transformation of coordinates between the video data output buffer and the preview layer.

5. String classification extension

Inside String+Classifications we have defined several String extension that enables us to classify recognized text with Vision framework.

5.1 Card number classification

extension String {
    
    func checkCardNumber() -> String? {
        let cardValidator = CardValidator()
        guard
            cardValidator.validationType(from: self) != nil,
            cardValidator.validateWithLuhnAlgorithm(cardNumber: self)
        else {
            return nil
        }
        return sanitizedNumericString
    }
    
    func extractCardNumber() -> String? {
        
        print("Extracting number from: \(self)")
        
        let pattern = "(\\d{4}\\s?\\d{4}\\s?\\d{4}\\s?\\d{4})|(\\d{4}\\s?\\d{6}\\s?\\d{5})|(\\d{4}\\s?\\d{4}\\s?\\d{4}\\s?\\d{2})"
        
        guard let range = range(of: pattern, options: .regularExpression, range: nil, locale: nil) else {
            // No exp date found.
            return nil
        }
        
        let potentialNumber = String(self[range])

        let cardValidator = CardValidator()
        guard
            cardValidator.validationType(from: potentialNumber) != nil,
            cardValidator.validateWithLuhnAlgorithm(cardNumber: potentialNumber)
        else {
            return nil
        }
        return potentialNumber.sanitizedNumericString
    }
}

The string extension has two functions: checkCardNumber() and extractCardNumber().

checkCardNumber(): This function checks if the given string is a valid card number by using a class called CardValidator. This function returns the sanitized numeric string if the string is a valid card number.
extractCardNumber(): This function is used to extract the card number from a string. It first uses a regular expression pattern to find the potential card number in the string. Then it uses the checkCardNumber() function to validate if the potential number is a valid card number. If it is a valid card number, the function returns the sanitized numeric string.

5.2 Expiration data

The extractExpDate() method is a utility function that takes a string as input, and attempts to extract the expiration date of a credit card from the string. The method starts by defining a pattern string that represents a regular expression. The pattern string specifies the expected format of the expiration date: "MM/YY" or "MM/YYYY".

extension String {
    func extractExpDate() -> String? {
        let pattern = "(0[1-9]|1[0-2])\\/([0-9]{4}|[0-9]{2})"
        
        guard let range = range(of: pattern, options: .regularExpression, range: nil, locale: nil) else {
            // No exp date found.
            return nil
        }
        
        let expDate = String(self[range])
        return expDate
    }
}

The range(of:options:range:locale:) method is then used to search for the first match of the pattern string within the input string. If a match is found, the corresponding range of the match is extracted and converted to a string, which is returned as the result of the method. If no match is found, the method returns nil.

The regular expression used in the pattern string is crucial in making this function work. It specifies the expected format of the expiration date by using special characters such as \ to escape characters and / to represent a literal forward slash. The pattern starts with the capture group (0[1-9]|1[0-2]), which matches either a two-digit month starting with "01" to "09", or a two-digit month starting with "10" to "12". The capture group is followed by a literal forward slash /, and another capture group ([0-9]{4}|[0-9]{2}), which matches either a four-digit year or a two-digit year.

In conclusion, the extractExpDate() method is a simple yet effective way to extract expiration dates from a string, as long as the input string follows the expected format. By using regular expressions, this method can handle a wide range of input formats, and make it easier to extract the expiration date from unstructured data.

5.3 Phone Number

The implementation of the extractPhoneNumber() method starts by using a regular expression pattern to search for a substring in the string that matches the format of a US phone number. The pattern is capable of matching various common phone number formats, including those with international prefixes, and those with or without separators between the digits.


extension String {
    // Extracts the first US-style phone number found in the string, returning
    // the range of the number and the number itself as a tuple.
    // Returns nil if no number is found.
    func extractPhoneNumber() -> (Range<String.Index>, String)? {
        // Do a first pass to find any substring that could be a US phone
        // number. This will match the following common patterns and more:
        // xxx-xxx-xxxx
        // xxx xxx xxxx
        // (xxx) xxx-xxxx
        // (xxx)xxx-xxxx
        // xxx.xxx.xxxx
        // xxx xxx-xxxx
        // xxx/xxx.xxxx
        // +1-xxx-xxx-xxxx
        // Note that this doesn't only look for digits since some digits look
        // very similar to letters. This is handled later.
        let pattern = #"""
        (?x)                    # Verbose regex, allows comments
        (?:\+1-?)?                # Potential international prefix, may have -
        [(]?                    # Potential opening (
        \b(\w{3})                # Capture xxx
        [)]?                    # Potential closing )
        [\ -./]?                # Potential separator
        (\w{3})                    # Capture xxx
        [\ -./]?                # Potential separator
        (\w{4})\b                # Capture xxxx
        """#
        
        guard let range = range(of: pattern, options: .regularExpression, range: nil, locale: nil) else {
            // No phone number found.
            return nil
        }
        
        // Potential number found. Strip out punctuation, whitespace and country
        // prefix.
        var phoneNumberDigits = ""
        let substring = String(self[range])
        let nsrange = NSRange(substring.startIndex..., in: substring)
        do {
            // Extract the characters from the substring.
            let regex = try NSRegularExpression(pattern: pattern, options: [])
            if let match = regex.firstMatch(in: substring, options: [], range: nsrange) {
                for rangeInd in 1 ..< match.numberOfRanges {
                    let range = match.range(at: rangeInd)
                    let matchString = (substring as NSString).substring(with: range)
                    phoneNumberDigits += matchString as String
                }
            }
        } catch {
            print("Error \(error) when creating pattern")
        }
        
        // Must be exactly 10 digits.
        guard phoneNumberDigits.count == 10 else {
            return nil
        }
        
        // Substitute commonly misrecognized characters, for example: 'S' -> '5' or 'l' -> '1'
        var result = ""
        let allowedChars = "0123456789"
        for var char in phoneNumberDigits {
            char = char.getSimilarCharacterIfNotIn(allowedChars: allowedChars)
            guard allowedChars.contains(char) else {
                return nil
            }
            result.append(char)
        }
        return (range, result)
    }
}

If a potential phone number is found, the method then removes any non-numeric characters, such as punctuation, whitespace, and country prefixes, and ensures that the result is exactly 10 digits long. To account for misrecognized characters, the method substitutes commonly misrecognized characters with their numeric counterparts, such as ‘S’ with ‘5’ or ‘l’ with ‘1’.

Finally, the method returns a tuple with the range of the phone number in the string and the phone number itself as a string, or nil if no phone number could be found.

6. Usage demo

7. Code repository

Full code repository can be found on my github CardScanner.

Generated by OpenAI’s language model ChatGPT. (https://openai.com/)