hpr3596 :: Extracting text, tables and images from docx files using Python

b-yeezi
Language: English
Source:

Summary: In this episode, I describe how I used 2 python libraries to extract import data from docx files

Series: A Little Bit of Python

Source: [http://hackerpublicradio.org/eps.php?id=3596](http://hackerpublicradio.org/eps.php?id=3596)

Original audio: [http://archive.org/download/hpr3596/hpr3596\_source.flac](http://archive.org/download/hpr3596/hpr3596\_source.flac)

Tools to extract data from docx files:


  1. docx2txt

  2. python-docx2txt

  3. python-docx

Code Snippets


text = docx2txt.process(src, img_dest)with open("data.txt", "wt") as f: f.write(text)

document = docx.Document(src)tables = document.tablesdata = []for table in tables: table_data = [] for row in table.rows: row_data = [] for cell in row.cells: row_data.append(cell.text) table_data.append(row_data) data.append(table_table)for i, table in enumerate(tables): with open(f"{i}.csv", "wt") as f: writer = csv.writer(f) writer.writerows(table)

Now Playing

1/1hpr3596

00:00
00:00
1 Chapter(s)
  • 1. hpr3596

Comments

Be the first to comment

There aren't any comments on this content yet. Start the conversation!

Tags: hpr3596 :: Extracting text, tables and images from docx files using Python audio, hpr3596 :: Extracting text, tables and images from docx files using Python - b-yeezi audio, free audiobook, free audio book, audioaz

Advertisement