In many professional setups, B2B but also B2C one of the results of a test case may very well be a PDF. With baangt test automation tool you may go beyond the assertion, whether a PDF-File was created. baangt enables you to also perform PDF comparison during your regular test automation! No extra costs, no premium fees. PDF comparison works out of the box, is simple, fast and easy to setup. You may thank me later!

What is a PDF and why is it so widely used?

PDF (Portable Document Format) is a file format, that pretty much guarantees, that the contents of a file show up on each device pretty much exactly as they should. Especially in older times WYSIWIG (What You See Is What You Get) was a nice theory, but rarely worked. Missing fonts on one device, different color schema on another device and many more problems led to documents looking different on different devices. Adobe was able to fix this problem once and for all. PDF was and is the solution to that problem. Moreover PDF files can not be easily changed, so for instance an invoice, that is stored in PDF format can be usually assumed to be still the original document.

That’s two main reasons (same layout on practically all devices and rather difficult to edit) which makes PDF format still great for B2B and B2C communication when more or less official files are needed.

Why do we need PDF comparison in test automation?

Think about a scenario, where a customer places a sales order. You automate this scenario. Everything works. You use baangt test data generator to generate 1000 different permutations of sales orders. 1 to 10 products, different quantities, different shipping addresses. All great. In the test case definition you created an assertion to make sure the invoice PDF document downloads after clicking the download button. You also manually tested the invoice generation via PDF and saw a satisfying result. That was 2 weeks ago. Today we want to ship the latest software version to production.

Often the pure download is not enough – we need to compare PDF files!

In the above scenario there is a major risk included – a disaster waiting to come. I’ve seen it countless times. Empty PDF-files or files without or a wrong a recipient. Files with wrong sender information, wrong logo, wrong address. Invoices with negative amounts. Invoices without any sum. The list goes on.

Which ways of PDF analysis exist in test automation?

One way is to develop the desired logic of the print program again in your test automation suite. Obviously that’s not an easy task to do. Also whenever you change the logic in the original print program, you’ll change the logic in the test automation tool too. Tedious, expensive, error prone, but if you can invest that kind of effort, time and money for sure a very good approach!

For sure there is also an AI for that job. Cool, talented startups develop something, that is able (is it?) to distinguish between a desired change in design or content an a real bug. We have heard the claims. We have seen breathtaking demos. Endless opportunities, TCO near zero! Let the AI handle it. I for one have never seen a working installation. Usually 2-3 months in, it gets a bit quiet in such projects. A year later the promising new approach is demoted to “a Proof-of-conept, which didn’t turn out as expected”.

The baangt way is simply PDF comparison!

Another common way is PDF comparison. Take one (or more) reference document and mark areas to ignore between this version of the document and it’s later versions. A good example of such an area is a date field. Dates will change between the reference document and a later test run.

Once these areas are out of the way you can use both versions of this PDF and compare the remaining document with the result of a later test case. This method is not as accurate as writing the overall logic again. But it is faster, cheaper and if you do it right, will give you early enough indications about a potential problem. Also you can easily exchange the reference document with a new version, for instance if the sender’s address changes. Also if marketing decides, that it is time for a fresh look: Just replace the reference document.

How does PDF Comparison work in test automation suite baangt?

It’s pretty straight forward. As a first step you’ll need to have a so called reference PDF. That’s the original file, which you want later test cases to compare to. You upload this file to your local baangt PDF Comparison file store. Either on your computer, in your local network or in the cloud.

In the next step you provide the REGEX for irrelevant areas. Regular expressions are an incredible powerful tool for identifications of strings based on patterns. Provide one string for each area, that should be ignored during PDF comparison.

After saving the reference file, the application answers with an ID for this specific document. In the next step you need to have this ID. In your test case definition issue the activity ‘PDFCompare’ after a download. Baangt will automatically understand, which file is the latest download. In the column “Value1” enter the ID of the reference document. Thats it! Now you can execute the test run. Baangt shows differences in the column “PDF Comparison” of the result sheet. Also baangt automatically sets the test case status to “failed” when the result of the PDF comparison is unsatisfying.

To make your life even more easy, baangt produces 2 additional file versions for your manual analysis: 1 file showing the differences between the reference document and the copy and one file vice versa.

Of course PDF comparison works with the version concept, where we deal with different software states in different stages of your system landscape!

To get started use one of the examples in the baangt example folder.