Tuesday 24 December 2013

Merge multiple PDF files using VBA, Excel and PDFtk

Merge PDF files based on their names

The problem statement

I have come across a situation where I needed to merge thousands of pairs of PDF files, based on their filenames. This blog outlines the method I used.

There are a set of PDF files with this naming convention: ABCDEFGHIJ_2013-14.pdf
Here "ABCDEFGHIJ" is the part of the file name which varies from file to file.
Let us call this set of files "SET A".

There is a second set of files with this naming convention: 00012345_ABCDEFGHIJ_2012.pdf
Here, "00012345_ABCDEFGHIJ" is the variable part.
This set of files will be called "SET B".

Files from each set, which have the common field "ABCDEFGHIJ" need to be merged.

The format of merged file will be "00012345_ABCDEFGHIJ_2013-14.pdf"

Tools you'll need

The method

First of all, let's get things organized.
Create a folder in your computer and name it "PDF Merge".
In it, create two folders "SET A" and "SET B". Next, create a folder "Merging Area" within "PDF Merge".
"Merging area" will the the folder where the actual merging will take place.

Put all files in their respective folders.

Install PDFtk.
This is a PDF manipulation tool which has a command-line interface.

Create an Excel file in the "PDF Merge" folder. Open it up, and save it it as a macro-enabled workbook (.xlsm).



In the workbook, rename the first sheet as "Main" and second sheet as "Paths".
Fill in the fields as shown, and edit the paths to the ones relevant to you, but make sure the order is same as shown.



Press Alt-F11 to enter VBA editor.
Go to Tools->References and enable 'Microsoft Scripting Runtime'. Press OK.

Add a new module and insert the following code:


Option Explicit

Sub MatchFiles()
'create a new file system object
Dim fso As New FileSystemObject
Dim fol As Folder
Dim fil As File
Dim i As Integer

'now get a pointer to SET A
Set fol = fso.GetFolder(Worksheets("Paths").Range("B3").Value & "\")

'now loop over all of the files in the folder
Worksheets("Main").Select

i = 1

For Each fil In fol.Files
    Range("A" & i).Value = fil.Name
    i = i + 1
   fil.Copy Worksheets("Paths").Range("B1").Value & "\" & fil.Name, True
Next fil

'now get a pointer to SET B
Set fol = fso.GetFolder(Worksheets("Paths").Range("B2").Value & "\")

Dim cellname As Range

Worksheets("Main").Select
Range("A1").Select

If Range("A2").Value <> "" Then
Range(ActiveCell, ActiveCell.End(xlDown)).Select
End If

'Match files based on file name
For Each cellname In Selection
    For Each fil In fol.Files
        If StrComp(Mid(fil.Name, 10, 10), Mid(cellname.Value, 1, 10)) = 0 Then
            cellname.Offset(0, 1).Value = fil.Name
            fil.Copy Worksheets("Paths").Range("B1").Value & "\" & fil.Name
        End If
    Next fil
Next cellname

Call Make_Text

End Sub


Insert a new Module and paste the following code:


Option Explicit

Sub Make_Text()
'Create a text file from the 3rd column values
    Dim i As Long, myDir As String, temp, x As Long
    myDir = Worksheets("Paths").Range("B1").Value & Application.PathSeparator
    With Range("a1").CurrentRegion
        For i = 1 To .Columns.Count
            x = Application.CountA(.Columns(3))
            temp = Application.Transpose(.Cells(1, 3).Resize(x).Value)
            Open myDir & .Cells(1, 3).Text & ".txt" For Output As #1
            Print #1, Join(temp, vbCrLf)
            Close #1
        Next
    End With
    
Call rename_file

End Sub

Insert a third Module and paste the following code:


Option Explicit

Sub rename_file()
'Rename the text file to a batch file

Dim fso As New FileSystemObject

Dim fol As Folder

Set fol = fso.GetFolder(Worksheets("Paths").Range("B1").Value)

Dim fil As File

For Each fil In fol.Files
    If Right(fil.Name, 3) = "txt" Then
        fil.Name = "Run to merge.bat"
        Exit For
    End If
Next fil


End Sub

Creating the command-line calls

In the cell "C1" of "Main" sheet, insert this formula:

=IF(B1 <> "","pdftk A=" & A1 & " B=" & B1 & " cat A B output "  & MID(B1,1,9) & MID(A1,1,11) & "2013-14.pdf","")

This is the format required to use PDFtk to merge two files. You can visit their website to see other advanced methods to suit your needs.
After inputting it, use Autofill to fill the formula in as many cells as required.

Aaaand... we're done!

After you run the macro called "MatchFiles", you'll find all the matched files and a batch file named "Run to Merge.bat" inside the "Merging Area" folder.
Double-click this file to merge the files in the same folder. The output files will be in the same folder. You can segregate them from the input files by sorting the files by Date Modified.


Conclusion

Here I have accounted for scenarios where the count of files in SET A and SET B might differ, which was the situation I faced.
Go through the code carefully and see what changes you need to make to adapt it to your scenario.
Do comment with any suggestions or queries.

3 comments:

  1. Hello and thank you for the tutorial. Everything works except for the Run to merge.bat, which is the most important part, namely merging the files. It says: Run to merge.bat is not a valid Win32 application. Do you know a solution?

    Sincerely,
    Richard

    ReplyDelete
  2. Hi again, that error is now also resolved and the bat. file runs, but it does not merge the two pdf files. I looked on the website of pdftk and the code should be good, but still it does not merge the files, any idea why?

    ReplyDelete
  3. Hi dear, As u do this in excel I want to merge pdfs report into ms access, would u like to suggest any tutorial or sample of code How to do in Ms access vba

    ReplyDelete