Recursive files update by extension in Python

We are going to implement a script in Python to update recursively many html files at once.

A more generic codebase has been built in order to take other similar cases. We will walk through the lines on this page. At any moment you can retrieve the Python classes onto Github Python snippets repository.

I’ve tried to follow the Python Enhancement Proposals (PEP) as much as I could and I might have missed a few so do not hesitate to review my code and post comments so I can improve on my style.

Please read on the PEP8 guide to learn more about the proposals.

And remember the Zen of Python which is valid for other languages !

Script requirements

  1. List all html files of a specific folder (absolute/relative path)
  2. get into the head tags (or other tags/content)
  3. append a new line at the end of the tags

Notes :

  • Instead of html, we should be able to alter other type of files.
  • Instead of the head tags, we could also append into other tags or content.
  • We should be able to pass the content to insert from the command line

System Requirements

I think the codebase is fully compatible with Python 3, I’m running it on a virtual environment set up by my IDE which embeds Python 3.6.9. Let me know in case of any trouble as I won’t be covering the set up here.

Run the script

The main.py script is using the argparse module so simply access the helper of the program :

python main.py --help
python --help arguments
main.py arguments

Now you know how to run the script and obviously the script to launch it (main.py) – in case you forgot an argument, it will tell you !

python main.py --path '/home/user/yourpath/' --ext 'html' '<script>'

Main script – Program entry

The code speaks for itself :

import argparse

from BulkFileModification import BulkFileModification


def main(path, extension, content):
    bulk = BulkFileModification(path, extension)
    print('Operating on the following files : ', bulk.file_paths)
    bulk.bulk_head_insert(content)


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--path', help='Indicate the absolute path of the files to browse.', default='.')
    parser.add_argument('--ext', help='Provide the extension to filter with.', default='html')
    parser.add_argument('content', help='Content to include (ex : "<script>...</script>").')
    args = parser.parse_args()

    # execute only if run as a script
    main(args.path, args.ext, args.content)

You will likely notice the imports including argparse but also the BulkFileModification (I wasn’t inspired, if you find a better name, hit me :)). The later one is our own script implementing a way to list all files of the given extension in the provided path.

Except for the mandatory (notice the missing ) content argument, all others are optional and provide a default value.

Bulk File Modification script

Following the course of the whole program, there is this intermediary script which takes the arguments as entries and call the heart of the program to append the content to the wanted files.

There are 2 functionalities implemented here :

  1. Search for all files using the Python glob module
  2. Apply the insertions by calling the FileModification module.
import glob

from FileModification import FileModification


class BulkFileModification:

    def __init__(self, path, extension):
        self._path = path
        self._extension = extension
        self.file_paths = self.get_file_paths()

    def bulk_head_insert(self, content):
        for file in self.file_paths:
            modifier = FileModification(file)
            modifier.head_append(content)
            modifier.write()

    def get_file_paths(self):
        return glob.glob(self._path + '**/*.' + self._extension, recursive=True)

No big deal here, we’ve split the class in 2 expected methods, a pattern is used for glob to search files present in a specific path and a filter based on the file extension is applied in the same time.

The insert part manages the insertion by calling the proper script and writing to the files one by one.

File Modification script

The last script offers a way to take the content to append and the path of a the file to modify as inputs and does its job once called.

To follow good coding practices, a unit test is covering the main action and you will find a test.html which serves it. 

# -*-coding:Utf-8 -*


class FileModification:

    def __init__(self, file_path):
        self._file_path = file_path
        file = open(self._file_path, "r")
        self.content = file.read()
        file.close()

    def head_append(self, newline):
        try:
            index = self.content.index('</head>')
            self.content = self.content[0:index] + newline + self.content[index:]

        except:
            print("Could not find index at file " + self._file_path)

    def write(self):
        file = open(self._file_path, "w")
        file.write(self.content)
        file.close()

import unittest
import os

from pathlib import Path


from fileIO.FileModification import FileModification


class FileModificationTest(unittest.TestCase):

    def test_head_append(self):
        fm = FileModification(os.path.join(Path(__file__).parent, "test.html"))
        fm.head_append('<script>')
        self.assertEqual('<head><script></head>', fm.content)
        fm.close()




We simply use the open() file I/O functionality of Python to open the file for reading and writing the content.

The index function helps us find the spot to insert the content at.

Now you can run the unit test using the following command :

python -m unittest discover -s fileIO -p “*Test.py”

Thanks to the os module and the Path library to retrieve the proper path for the test.html file.

Well, we are done with the program !

I let you investigate further to implement the feature to use another tag or type content which you could pass as input of the program.

Let me know if anything ticks you 🙂

More python resources :

Glob module

ArgParse library

Open function

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Want more information?

Related links will be displayed within articles for you to pick up another good spot to get more details about software development, deployment & monitoring.

Stay tuned by following us on Youtube.

%d bloggers like this: