pdf-text-extraction

Star

Here are 16 public repositories matching this topic...

houking-can / PDFSDK

Star

Based on Foxit Quick PDF Library，python interface

pdf-merge pdf-split pdf-document-processor pdf-sdk pdf-text-extraction

Updated Apr 4, 2020
Python

mamiriqbal1 / rag_book_qa_prompt

Star

A simple demonstration of how you can implement retrieval augmented generation (RAG) for a book.

question-answering rag pdf-text-extraction large-language-models llm chatgpt-web retrieval-augmented-generation

Updated Nov 29, 2023
Jupyter Notebook

hyeonsangjeon / PDF2LLM-Tuning-Studio

Star

PDF 문서에서 GPU 가속 처리로 고품질 질의응답(QA) 데이터를 자동 생성하고 LLM을 효율적으로 파인튜닝하는 솔루션입니다. Unstructured 라이브러리와 AWS Bedrock Claude로 도메인 특화 QA 쌍을 생성하고, LoRA 기법으로 경량 모델을 훈련합니다.

processing docker aws gpu cuda bedrock data-extraction pdf-generation claude unstructured distillation finetuning sagemaker pdf-text-extraction data-argumantation llm unsloth processing-job text-disti

Updated Jun 5, 2025
Jupyter Notebook

vijayengineer / PDFTextSpeechConverter

Star

Converts scanned documents and ordinary documents into speech mp3 using Amazon Polly

pdf text images speech aws-polly audiobook synthesis scanned-documents pdf-text-extraction

Updated Dec 30, 2020
Python

PrathameshDhande22 / PdfTxtBot

Star

A Telegram bot which extract Text from PDF, also extract the Images of PDF Pages. Made with Python

python telegram telegram-bot python3 python-telegram-bot image-extractor python-telegram pdf-text pdf-text-extraction pdf-image

Updated Feb 27, 2023
Python

eli64s / pdflex

Sponsor

Star

CLI for merging PDF contexts.

pdf-converter pdf-document pdf-generator pdf-manipulation pdf-extractor pdf-library pdf-parser pdf-data-extraction pdf-processor pdf-tools pdf-document-processor python-pdf pdf-search pdf-text-extraction pdf-python pdf-automation python-pdf-tools pdf-document-parser pdf-regex

Updated Mar 20, 2025
Python

Zeeshanahmad4 / NLP-Pdf-Minning-Extracting-text-from-pdf

Star

NLP Pdf Minning Extracting text from pdf

python pdf pdf-converter text-extraction pdfkit pdf-files extract-text pdftotext pdf-format pdf-document-processor pdftoimage pdftools pdftohtml pdf-text-extraction pdfcon

Updated Apr 2, 2020
Python

VirajMadhu / pdf_key_matcher

Star

Highlights the key matches between your Given PDF and the description text

python open-source pdf cv python-script python3 text-extraction terminal-based ats text-compression pdf-text-extraction virajmadhu

Updated Dec 4, 2024
Python

rithulkamesh / docproc

Sponsor

Star

Opinionated and Sophisticated Document Region Analyzer.

python machine-learning ocr text-classification text-extraction data-extraction region-detection content-extraction document-analysis layout-analysis pdf-processing pdf-text-extraction document-parsing equation-detection mathematical-symbols

Updated Apr 13, 2025
Python

rmottanet / unchainedtext

Star

UnchainedText: Break free from PDFs! Easily extract raw text to .txt for preprocessing.

extractor text-extraction data-extraction text-processing pdf-text-extraction text-extraction-tool

Updated Apr 2, 2024
Python

holasoymas / text-finder

Star

PDF Text Finder Console App along with page number

csharp console-app pdf-text-extraction pdf-text-processing

Updated Mar 20, 2025
C#

towfique-elahe / pdf-to-structured-csv

Star

A Python-based tool for extracting structured data from PDFs using OCR and regex, and exporting it to CSV. Ideal for processing invoices, logs, or scanned documents into organized, usable datasets.

ocr data-extraction pdf-to-csv document-processing pytesseract pdf2image python-automation pdf-text-extraction structured-data-extraction regex-parsing

Updated Oct 30, 2024
Jupyter Notebook

kushalpatel0265 / Resume-Parser

Star

A resume parser that extracts key details from PDF files using Groq's LLM

python nlp api google-colab pdf-text-extraction streamlit-webapp llm

Updated Apr 14, 2025
Jupyter Notebook

RealBlueSwan / BSPDFDataExtractor

Star

Extracts Data from provided PDF using key words to identify relevant datapoints. Using UglyToad PDFPIG(great lib btw)

pdf-text-extraction

Updated Jul 20, 2024
C#

A robust, modular web crawler built in Python for extracting and saving content from websites. This crawler is specifically designed to extract text content from both HTML and PDF files, saving them in a structured format with metadata.

rate-limiting http-requests error-handling html-parsing data-collection text-processing web-crawling content-extraction yaml-configuration data-scraping python-crawler modular-design metadata-storage url-normalization pdf-text-extraction structured-data-storage concurrent-crawling data-extraction-pipeline data-preservation-and-recovery

Updated Nov 18, 2024
Python

Spikes2012 / DjangoBusPriority

Star

This is for Technology Application Project at Swinburne University of Technology

django file-upload text-extraction image-to-text webapplication pdf-text-extraction

Updated Jun 6, 2023
Python

Improve this page

Add a description, image, and links to the pdf-text-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pdf-text-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf-text-extraction

Here are 16 public repositories matching this topic...

houking-can / PDFSDK

mamiriqbal1 / rag_book_qa_prompt

hyeonsangjeon / PDF2LLM-Tuning-Studio

vijayengineer / PDFTextSpeechConverter

PrathameshDhande22 / PdfTxtBot

eli64s / pdflex

Zeeshanahmad4 / NLP-Pdf-Minning-Extracting-text-from-pdf

VirajMadhu / pdf_key_matcher

rithulkamesh / docproc

rmottanet / unchainedtext

holasoymas / text-finder

towfique-elahe / pdf-to-structured-csv

kushalpatel0265 / Resume-Parser

RealBlueSwan / BSPDFDataExtractor

simonpierreboucher / Crawler

Spikes2012 / DjangoBusPriority

Improve this page

Add this topic to your repo