We are going to implement a script in Python to update recursively many html files at once.
A more generic codebase has been built in order to take other similar cases. We will walk through the lines on this page. At any moment you can retrieve the Python classes onto Github Python snippets repository.
I’ve tried to follow the Python Enhancement Proposals (PEP) as much as I could and I might have missed a few so do not hesitate to review my code and post comments so I can improve on my style.
Please read on the PEP8 guide to learn more about the proposals.
And remember the Zen of Python which is valid for other languages !
- List all html files of a specific folder (absolute/relative path)
- get into the head tags (or other tags/content)
- append a new line at the end of the tags
- Instead of html, we should be able to alter other type of files.
- Instead of the head tags, we could also append into other tags or content.
- We should be able to pass the content to insert from the command line
I think the codebase is fully compatible with Python 3, I’m running it on a virtual environment set up by my IDE which embeds Python 3.6.9. Let me know in case of any trouble as I won’t be covering the set up here.
Run the script
The main.py script is using the argparse module so simply access the helper of the program :
python main.py --help
Now you know how to run the script and obviously the script to launch it (main.py) – in case you forgot an argument, it will tell you !
python main.py --path '/home/user/yourpath/' --ext 'html' '<script>'
Main script – Program entry
The code speaks for itself :
import argparse from BulkFileModification import BulkFileModification def main(path, extension, content): bulk = BulkFileModification(path, extension) print('Operating on the following files : ', bulk.file_paths) bulk.bulk_head_insert(content) if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument('--path', help='Indicate the absolute path of the files to browse.', default='.') parser.add_argument('--ext', help='Provide the extension to filter with.', default='html') parser.add_argument('content', help='Content to include (ex : "<script>...</script>").') args = parser.parse_args() # execute only if run as a script main(args.path, args.ext, args.content)
You will likely notice the imports including argparse but also the BulkFileModification (I wasn’t inspired, if you find a better name, hit me :)). The later one is our own script implementing a way to list all files of the given extension in the provided path.
Except for the mandatory (notice the missing —) content argument, all others are optional and provide a default value.
Bulk File Modification script
Following the course of the whole program, there is this intermediary script which takes the arguments as entries and call the heart of the program to append the content to the wanted files.
There are 2 functionalities implemented here :
- Search for all files using the Python glob module
- Apply the insertions by calling the FileModification module.
import glob from FileModification import FileModification class BulkFileModification: def __init__(self, path, extension): self._path = path self._extension = extension self.file_paths = self.get_file_paths() def bulk_head_insert(self, content): for file in self.file_paths: modifier = FileModification(file) modifier.head_append(content) modifier.write() def get_file_paths(self): return glob.glob(self._path + '**/*.' + self._extension, recursive=True)
No big deal here, we’ve split the class in 2 expected methods, a pattern is used for glob to search files present in a specific path and a filter based on the file extension is applied in the same time.
The insert part manages the insertion by calling the proper script and writing to the files one by one.
File Modification script
The last script offers a way to take the content to append and the path of a the file to modify as inputs and does its job once called.
To follow good coding practices, a unit test is covering the main action and you will find a test.html which serves it.
# -*-coding:Utf-8 -* class FileModification: def __init__(self, file_path): self._file_path = file_path file = open(self._file_path, "r") self.content = file.read() file.close() def head_append(self, newline): try: index = self.content.index('</head>') self.content = self.content[0:index] + newline + self.content[index:] except: print("Could not find index at file " + self._file_path) def write(self): file = open(self._file_path, "w") file.write(self.content) file.close()
import unittest import os from pathlib import Path from fileIO.FileModification import FileModification class FileModificationTest(unittest.TestCase): def test_head_append(self): fm = FileModification(os.path.join(Path(__file__).parent, "test.html")) fm.head_append('<script>') self.assertEqual('<head><script></head>', fm.content) fm.close()
We simply use the open() file I/O functionality of Python to open the file for reading and writing the content.
The index function helps us find the spot to insert the content at.
Now you can run the unit test using the following command :
python -m unittest discover -s fileIO -p “*Test.py”
Thanks to the os module and the Path library to retrieve the proper path for the test.html file.
Well, we are done with the program !
I let you investigate further to implement the feature to use another tag or type content which you could pass as input of the program.
Let me know if anything ticks you 🙂
More python resources :