PDF Processing Software

Description

Software that processes papers in PDF format by calling Grobid’s web service and makes a wordcloud for each of them as well as giving a graph indicating the number of figures per article and a list of links found in all of them.

Requirements

Papers used for input must have an abstract section or the software will fail.

This build has been developed on Python 3.10 and should work with higher versions.

Instructions

  1. Copy this repo in whichever directory you like

  2. Create a folder called “pdfs” and put inside the papers you want to process

  3. Install Grobid’s Python Client

  4. Go to the src folder and run the script

You can check the results in the folders “wordclouds”, “figures” and “links”, which will be created in the directory after you run the script.

Contact

Main author and contact: andres.montero.martin@alumnos.upm.es