ENHANCE AND DEBUG PDF DATA EXTRACTION PROGRAM (PYTHON)
Descrição da oferta de emprego
The program is designed to extract specific information (e.
., ID Constructie, Suprafata constructie, Nr.
CF, Nume Proprietar, and Intravilan status) from Romanian land registry PDFs.
While it’s functional in some areas, certain parts do not work as expected, and the code overall needs optimization.
What the Program Does.
Extracts data from PDF files using pdfplumber and regex.
Extracts key fields like.
ID Constructie (Construction IDs) Suprafata constructie (Construction Area) Nr.
CF (Land Registry Number) Nume Proprietar (Owner Name) Intravilan status (Whether the land is "DA" or "NU") Categorie de folosinta (Land Usage Category) The extracted data is saved into an Excel file using pandas and openpyxl.
Issues Faced.
ID Construcție & Suprafață Construcție.
These fields are not extracted accurately.
The correct logic should be based on the A1.
format for IDs and values following specific patterns.
Currently, the function doesn't meet expectations.
Inconsistent PDF Formats.
PDFs often vary in structure, especially for key phrases like "Date referitoare la teren" or "Lungime Segmente".
Some PDFs lack these sections altogether, causing failures.
Fallback Mechanisms.
When sections like "" are missing, the program should search alternate ranges, but this logic needs fine-tuning.
Other Enhancements.
General code improvements.
Robust error handling, optimized regular expressions, and flexibility to adapt to varied PDF layouts.
Here's a solid description for your Freelancer post, tailored to your specific needs and project progress so far.
Project Title.
"Debug and Enhance PDF Data Extraction Program (Python)" Description.
I have a Python program designed to extract specific data fields from PDFs, such as property documents ("Cărți Funciare").
The program uses libraries like pdfplumber, re, and pandas to process the PDFs and output results into an Excel file.
While the core functionality is implemented, there are issues and areas for improvement that need an expert to resolve.
What the Program Does.
The current program extracts.
Nr.
CF - Land registry number.
Nume Proprietar - Owner's name(s).
Suprafață Teren - Land area.
ID Construcție & Suprafață Construcție - IDs of constructions and their respective areas.
Intravilan - Status ("DA" or "NU") indicating land classification.
Categorie de Folosință - Category of land usage (e.
., Arabil, Padure, etc.).
The extracted data is then saved into an Excel file using pandas and openpyxl.
Issues Faced.
ID Construcție & Suprafață Construcție.
These fields are not extracted accurately.
The correct logic should be based on the A1.
format for IDs and values following specific patterns.
Currently, the function doesn't meet expectations.
Inconsistent PDF Formats.
PDFs often vary in structure, especially for key phrases like "Date referitoare la teren" or "Lungime Segmente".
Some PDFs lack these sections altogether, causing failures.
Fallback Mechanisms.
When sections like "" are missing, the program should search alternate ranges, but this logic needs fine-tuning.
Other Enhancements.
General code improvements.
Robust error handling, optimized regular expressions, and flexibility to adapt to varied PDF layouts.
What I Need.
Debug and fix the extraction of "ID Construcție" and "Suprafață Construcție".
IDs should be accurately matched (e.
., "A1.
" format).
Improve extract_intravilan_status and ensure it searches multiple ranges if one fails.
Enhance program flexibility to handle PDFs with inconsistent or missing sections.
Clean and optimize regular expressions and search logic for better accuracy.
Implement fallback mechanisms for edge cases when specific sections are not found.
Review other functions (like extract_categorie_folosinta and extract_nume_proprietar) and improve efficiency and reliability.
Ideal Skills.
- Proficient in Python - Experience with PDF data extraction - Strong debugging skills - Ability to enhance program functionality - Familiarity with handling varied data formats Deliverables.
A working Python script with improved functionality.
Debugged and accurate extraction for all required fields (IDs, areas, intravilan status, etc.).
Clear documentation on updates made, especially new logic added.
Program capable of handling varied PDF formats and edge cases.
Python Depuração Processamento de dados Pandas ID do Projeto.
# Sobre o projeto 22 propostas Aberto para ofertas Projeto remoto Ativo em 24 minutos atrás
Detalhes da oferta
- Indeterminado
- Em todo Portugal
- Indeterminado - Indeterminado
- 16/12/2024
- 16/03/2025
Responsibilities: design, develop, and implement automation and robotics systems to enhance operational efficiency... strong communication and collaboration skills... excellent problem-solving and troubleshooting skills... stay abreast of industry trends and emerging technologies in automation and robotics......
Fluent in german and english... marketing materials:- translate brochures and other marketing materials from de>pt and en>pt... adapt surveys and questionnaires for portuguese-speaking audiences, ensuring cultural relevance and clarity... excellent attention to detail and strong organizational skills......
In addition, demonstrated expertise and professional experience in ms sql’s t-sql... ideal, tableau; - knowledge of programming languages and core concepts such as java and python (preferred);- knowledge on how to implement ci/cd pipelines and flow automation; - previous experience with data migration......
@confidentialnote: mne library of python will be used to explore, visualise and analysehuman neurophysiological data... this proposal definesdetailed features& functionality and development methodology... 5 mysql linux and windows json, rest, api... new remedies ltd (henceforth referred to as “company”......
Customer orders related to customer reservations, orders and purchases... we act on a framework of transparency, responsibility and respect... our commitment and involvement, together with constant technological innovation, has allowed us to position ourselves as one of the leading national human resources......
Se fala inglês e alguma das seguintes linguas: holandês, francês, alemão, espanhol, italiano não hesite em enviar-nos o seu curriculo... cctalents é uma empresa de recrutamento com vários clientes a nível nacional e internacional que atua desde 2016 em franco crescimento... estamos a recrutar para empresa......
· analyse and verify best automated and manual test approaches and execute acceptance, integration and system testing... · analyse performance test requirements and develop test plans and debug to understand test objective requirements... · coordinate with program and development management teams in......
Requisitos do trabalho requirements: mandatory proficiency in english and dutch strong client-facing and communication skills customer service orientation available to work in fixed schedules role purpose: provide first level contact and convey resolutions to customer issues properly escalate unresolved......
Requisitos do trabalho mandatory proficiency in english and german strong client-facing and communication skills customer service orientation available to work in fixed schedules role purpose: provide first level contact and convey resolutions to customer issues properly escalate unresolved queries to......
Requisitos do trabalho mandatory proficiency in english and german strong client-facing and communication skills customer service orientation available to work in fixed schedules role purpose: provide first level contact and convey resolutions to customer issues properly escalate unresolved queries to......