Publishing a free in-browser tool to mine pdfs on a precise keyword or expression
A new function has been added to Nocodefunctions.com in response to a need by journalists
The need: helping journalists sift through pdfs
Early April, journalist Runa Sandvik posted:
Imagine having 1000 PDFs and needing to find those with specific keywords. Here’s a tool many, many journalists need that someone could easily write and share. 🔍🗞
— Runa Sandvik (@runasand) April 2, 2022
- Take PDFs as input
- Convert to text
- Search for keywords (UTF-8)
- Output result as CSV
It felt like a feature Nocodefunction.com was well positioned to offer. So I ended up developing it in exactly a Sunday afternoon, and you can now find it here:
https://nocodefunctions.com/pdfmatcher/pdf_matcher_tool.html.
I reported it to Runa:
Hi, please find the function here:https://t.co/N9oM7dBOPN
— Clement Levallois (@seinecle) April 3, 2022
The export to Excel stalls for some reason but otherwise the results can be seen on the page. Please submit bug reports etc. to analysis@exploreyourdata.com
(free to use, respectful of the data, of course!)
… and I was disappointed to get no reply. But looking back at the original post, I realized it got tons of replies (91 so far), with “less than helpful comments” but also pointers to super useful resources.
Here is a summary of the solutions mentioned in the replies, all identified as able to sift through pdfs to search for expressions (free ✔️ or not 💰):
In-browser solutions ☁️
Desktop solutions 💻
- Aleph ✔️
- Open Semantic Search ✔️
- Acrobat DC 💰
- dnGrep ✔️ windows only
- Agent Ransack ✔️ windows only (freemium)
- 🤖 PDF Keywords Extractor 🤖 ✔️ not sure which platforms are supported?
Command line 🔤
- ripgrep-all ✔️ linux mac windows
- pdfind ✔️ (linux and mac)
- pdfgrep ✔️ (linux)
Your feedback
Try the solution I developed and give some feedback!
- 🎯 pdf matcher tool ☁️ ✔️