Tips & Tricks - Convert PDFs to image files
This workshop demonstrates how PDF files, which are saved as attachments in an application, can be converted into image files (e.g. .png or .jpg) via a process. The example application for this workshop can be downloaded here and imported into your portal as usual. Activate the included process afterwards.
Application
The example application contains an edit page where a title and description can be entered. Furthermore, there is a file control for uploading PDFs in the browser. The images generated by the PDFs are stored in the subordinate data group "images". The primary key of this data group has the type GUID. This setting can be selected when creating a new data group. The "images" data group contains a file data field where the link to the created images is saved by the process.
Process
The process responds whenever an entry is added or changed in the "PDFs" data group.
Afterwards, the images already stored in the child table should be deleted, in case an existing PDF is exchanged.
Then, the PDF file is broken down into images with a Groovy script action. The transformation of each PDF page is performed using the Apache Project "PDFBox" (https://pdfbox.apache.org). All necessary classes are included in the installation of Intrexx and simply need to be integrated into the Groovy script as an import.
import java.awt.Toolkit
import java.awt.image.BufferedImage
import de.uplanet.lucy.server.businesslogic.util.FileUCHelper
import org.apache.pdfbox.pdmodel.*
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.rendering.PDFRenderer
import java.io.IOException
import org.apache.pdfbox.pdmodel.PDPage
import org.apache.pdfbox.pdmodel.font.PDType1Font
import org.apache.pdfbox.pdmodel.PDPageContentStream
import javax.imageio.ImageIO
import java.util.List
import java.io.File
import java.io.FileWriter
import java.io.BufferedWriter
import org.apache.pdfbox.rendering.ImageType
In the next step, the defined content from the application needs to be obtained. At this point, the GUIDs and variable names need to be replaced with respect to your application.
//values to be changed
def strInputFileGuid = "3FD2E12C9D737FDEA12956339BE664CFE57550BA" /*pdf file Guid in parent table*/
def strRecIdGuid = "49355853C4816EA50DE6AE4906C9EFD8DFC5B1C2" /* record id Guid in parent table*/
def strImageFileGuid = "3BAE89765D5E2C7C91C3128E0527815B022D4AEB" /* file Guid in child table for the images */
def strImageDatagroup = g_rtCache.dataGroups["C15295917F975DFF15E0959AA5BE02817A8CA511"].getName() /* table name in database */
def strVariableFKLID = g_rtCache.fields["A2EC7A0463038240B4D78F41988A7BB68DAE17D8"].getName() /* foreign key variable in child table*/
def strVariableImageRecId = g_rtCache.fields["1A7DC10A0A402847516CEBB7EBA0D441C1D3ECF2"].getName() /* record id variable in child table */
def strVariablePage = g_rtCache.fields["1CC4FEDE1289764F1DA1EAFB3178E4EF48A65554"].getName() /* page variable in child table */
Afterwards, the script checks whether a file was uploaded and whether this is a PDF file. If this is the case, the PDF pages are transformed and added to the child table.
//define variables
def conn = g_dbConnections.systemConnection
def strNewGuid
def l_pdfPage = 1
File tempFile = null
def inputFile = g_record[strInputFileGuid] /* datafield file <file> */
int l_recId = g_record[strRecIdGuid].value /* datafield (PK) (S) ID <integer> */
if(inputFile.hasValue() && inputFile.getFirstFile().getContentType()=="application/pdf"){
try {
//get file from Intrexx
def pdfFile = new File(inputFile.getFirstFile().getPath())
// load PDF document
PDDocument document = PDDocument.load(pdfFile)
// get all pages
pages = document.getDocumentCatalog().getPages()
PDFRenderer pdfRenderer = new PDFRenderer(document);
// for each page
for (int i = 0; i < pages.getCount(); i++) {
// single page
PDPage singlePage = pages.get(i)
// to BufferedImage
BufferedImage buffImage = pdfRenderer.renderImageWithDPI(i, 300, ImageType.RGB);
/*use for quality configuration*/
//BufferedImage buffImage = singlePage.convertToImage(BufferedImage.TYPE_INT_BGR, 70)
// write image to temporary file
tempFile = File.createTempFile("pdfImage", ".png")
ImageIO.write(buffImage, "png", tempFile)
//generate new Guid for insert query
strNewGuid = newGuid()
//insert into child table
g_dbQuery.executeUpdate(conn, "INSERT INTO ${strImageDatagroup} (${strVariableFKLID}, ${strVariableImageRecId}, ${strVariablePage}) VALUES (?, ?, ?)") {
setInt(1, l_recId)
setString(2, strNewGuid)
setInt(3, l_pdfPage)
}
//save image to Intrexx
FileUCHelper.copyFileToIntrexx(g_context, tempFile.getAbsolutePath(), strImageFileGuid, strNewGuid, false)
//next l_pdfPage as integer
l_pdfPage = l_pdfPage +1
tempFile.delete()
}
document.close()
}
catch (IOException e) {
g_log.err(e)
}
finally
{
//delete pdfbox temp files
File path = new File("internal/tmp/")
for (File file : path.listFiles()) {
if (file.getName().endsWith(".tmp") && file.getName().startsWith("+~")) {
file.delete()
}
}
}
}