{"id":5442,"date":"2020-11-01T12:09:17","date_gmt":"2020-11-01T11:09:17","guid":{"rendered":"https:\/\/nil.uniza.sk\/?p=5442"},"modified":"2021-02-07T15:36:35","modified_gmt":"2021-02-07T14:36:35","slug":"network-traffic-dataset-pcap-anonymization","status":"publish","type":"post","link":"https:\/\/nil.uniza.sk\/en\/network-traffic-dataset-pcap-anonymization\/","title":{"rendered":"Network traffic dataset PCAP anonymization"},"content":{"rendered":"<h1>Network traffic dataset PCAP anonymization<\/h1>\n<ul>\n<li>Author: Miroslav Koh\u00fatik<\/li>\n<\/ul>\n<p>Sometimes you may need to provide PCAP files to third-party organizations or perhaps, in our case, publish a network traffic dataset. In order to not reveal your network infrastructure and\/or other sensitive data, you must anonymize these files before sharing them with anyone outside of you organization.<\/p>\n<h2>TraceWrangler<\/h2>\n<p>We use TraceWrangler for network data anonymization on OSI Layers 2 through 4. TraceWrangler is very easy to use and has an intuitive GUI:<br \/>\n<img decoding=\"async\" title=\"TraceWrangler\" src=\"https:\/\/imgur.com\/lOTmeuG.png\" alt=\"TraceWrangler\" \/><\/p>\n<p>TraceWrangler, however, isn&#8217;t perfect. First of all, the maximum size of a file that TraceWrangler can open is 2 GB. Since a typical network traffic dataset usually consists of PCAP\/pcapng files that are several gigabytes in size, you will need to split the files in question into smaller, more digestible chunks.<br \/>\nTo split up PCAP files we use Wireshark&#8217;s editcap feature. Since editcap lacks a GUI, we need to use Windows Command Prompt interface.<br \/>\nFirst, we need to change directory to Wireshark&#8217;s installation directory where editcap is located, by default it is <em>C:Program FilesWireshark<\/em>:<\/p>\n<pre><code>cd \"C:Program FilesWireshark\"<\/code><\/pre>\n<p>A typical Windows command to split a file using editcap looks something like this:<\/p>\n<pre><code>editcap -c 300000 \"C:datasetsdataset.pcap\" \"C:datasetsanondataset-split-.pcap\"<\/code><\/pre>\n<p>The option <em>-c 300000<\/em> defines the maximum amount of packets in a single output file. <em>&#8222;C:datasetsdataset.pcap&#8220;<\/em> is the path to input file and <em>&#8222;C:datasetsanondataset-split-.pcap&#8220;<\/em> contains the path and the name template of the output files.<br \/>\nSince TraceWrangler is still in beta and therefore has some bugs, like random errors that occur during anonymization of files larger that 50 MB, we recommend to set the maximum amount of packets for editcap output files to a value that would produce files well under 2GB, possibly even under 50 MB.<\/p>\n<p>After you open the files you are about to anonymize in TraceWrangler, click &#8222;anonymize files&#8220; to open the anonymization options menu. Before you begin, make sure to clear all default anonymization settings first, otherwise you will end up with heavily truncated files:<br \/>\n<img decoding=\"async\" title=\"Anonymization options\" src=\"https:\/\/imgur.com\/sBaGbf7.png\" alt=\"Anonymization options\" \/><\/p>\n<p>If you want to anonymize a large amount of IP addresses, it would be illogical to replace each one with a manually entered address. For this purpose you can check \u201cReplace IP addresses by subnet\u201d and pick \u201ckeep host part\u201d from the list of options. Check \u201cRecalculate CRC\u201d and pick \u201cKeep bad checksums bad\u201d if needed.<\/p>\n<p><img decoding=\"async\" title=\"IPv4 anonymization using TraceWrangler\" src=\"https:\/\/i.imgur.com\/UMX6ngg.png\" alt=\"IPv4 anonymization using TraceWrangler\" \/><\/p>\n<p>Finally, in the Output settings you can pick the directory to which you want to save the files. If you set filename to <em>&lt; filename&gt;_anonymized<\/em>, the resulting file\u2019s name will be the original file\u2019s name with the string <em>_anonymized<\/em> appended. Confirm the setting by clicking \u201cOkay\u201d and click \u201cRun\u201d to start anonymization.<\/p>\n<p>To merge the PCAP files into one, we use another feature of Wireshark: mergecap. Wireshark also provides file merging through GUI, however this is supported for two files at a time only. In our case, this would be very time consuming, therefore, we have used command line interface:<\/p>\n<pre><code>mergecap.exe -w \"C:datasetsdataset.pcap\" \"C:datasetsdataset-split01-anonymized.pcap\" \"C:datasetsdataset-split02-anonymized.pcap\" \"C:datasetsdataset-split03-anonymized.pcap\" \"C:datasetsdataset-split04-anonymized.pcap\" \"C:datasetsdataset-split05-anonymized.pcap\" \"C:datasetsdataset-split06-anonymized.pcap\" \"C:datasetsdataset-split07-anonymized.pcap\" \"C:datasetsdataset-split08-anonymized.pcap\" \"C:datasetsdataset-split09-anonymized.pcap\" \"C:datasetsdataset-split10-anonymized.pcap\" \"C:datasetsdataset-split11-anonymized.pcap\"<\/code><\/pre>\n<p>The -w option specifies the output file and all of the other paths specify the files to be merged. Files are merged chronologically according to their timestamps.<\/p>\n<h2>HxD<\/h2>\n<p>TraceWrangler, is only capable of anonymizing OSI layers 2 through 4 and thus cannot sanitize URIs, e.g. <a href=\"http:\/\/192.168.4.2\/index.php\">http:\/\/192.168.4.2\/index.php<\/a>. To sanitize URIs, we use hex editor <a href=\"https:\/\/mh-nexus.de\/en\/hxd\/\">HxD<\/a>. Unlike TraceWrangler, HxD is capable of modifying files of any size, located both on disk and RAM alike.<br \/>\n<img decoding=\"async\" title=\"HxD\" src=\"https:\/\/imgur.com\/cpvg0Fi.png\" alt=\"HxD\" \/><\/p>\n<p>Theoretically, you could use HxD to anonymize all layers without the need to use TraceWrangler. This would, however,\u00a0 result in <strong>incorrect checksums<\/strong>\u00a0in all of the headers.<br \/>\nTo anonymize L2 through L4 data, you can use search and replace using Hex values:<br \/>\n<img decoding=\"async\" src=\"https:\/\/imgur.com\/RTwzOqF.png\" alt=\"Search and replace using Hex\" \/><br \/>\nBe careful, though, the above example will replace the first two octets in the network 192.168.0.0\/16 with 172.16., but will also replace any two consecutive octets 192 and 168 in other addresses as well, e.g. 10.0.192.168 becomes 10.0.172.16. The more specific you are, the lower the risk of unwanted replacement: if you want to replace 192.168.1.1 with 192.0.0.1, be sure to replace 192.168.1. with 192.0.0., not just the latter two octets.<\/p>\n<p>Things are much easier on L7, here you can be much more specific with your replacements using text string replacing:<br \/>\n<img decoding=\"async\" src=\"https:\/\/imgur.com\/Vqth6yF.png\" alt=\"Search and replace using text string\" \/><\/p>\n<p>Depending on whether you are editing the file in your RAM or on you disk, changes to the file may not be permanent,always save your work after you&#8217;re done:<br \/>\n<img decoding=\"async\" src=\"https:\/\/imgur.com\/XEueFpE.png\" alt=\"Save file\" \/><\/p>","protected":false},"excerpt":{"rendered":"<p>Network traffic dataset PCAP anonymization Author: Miroslav Koh\u00fatik Sometimes you may need to provide PCAP files to third-party organizations or perhaps, in our case, publish a network traffic dataset. In order to not reveal your network infrastructure and\/or other sensitive data, you must anonymize these files before sharing them with anyone outside of you organization&#8230;.<\/p>","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_seopress_robots_primary_cat":"","_seopress_titles_title":"","_seopress_titles_desc":"","_seopress_robots_index":"","_kad_blocks_custom_css":"","_kad_blocks_head_custom_js":"","_kad_blocks_body_custom_js":"","_kad_blocks_footer_custom_js":"","_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"footnotes":""},"categories":[707],"tags":[1452,1450,1453,1451],"class_list":["post-5442","post","type-post","status-publish","format-standard","hentry","category-network-security-en","tag-anonymization","tag-dataset","tag-dataset-anonymization","tag-network-traffic-dataset"],"taxonomy_info":{"category":[{"value":707,"label":"Network security"}],"post_tag":[{"value":1452,"label":"anonymization"},{"value":1450,"label":"dataset"},{"value":1453,"label":"dataset anonymization"},{"value":1451,"label":"network traffic dataset"}]},"featured_image_src_large":false,"author_info":{"display_name":"Miroslav Koh\u00fatik","author_link":"https:\/\/nil.uniza.sk\/en\/author\/miroslav-kohutik\/"},"comment_info":16,"category_info":[{"term_id":707,"name":"Network security","slug":"network-security-en","term_group":0,"term_taxonomy_id":705,"taxonomy":"category","description":"","parent":0,"count":4,"filter":"raw","cat_ID":707,"category_count":4,"category_description":"","cat_name":"Network security","category_nicename":"network-security-en","category_parent":0}],"tag_info":[{"term_id":1452,"name":"anonymization","slug":"anonymization","term_group":0,"term_taxonomy_id":1450,"taxonomy":"post_tag","description":"","parent":0,"count":1,"filter":"raw"},{"term_id":1450,"name":"dataset","slug":"dataset","term_group":0,"term_taxonomy_id":1448,"taxonomy":"post_tag","description":"","parent":0,"count":1,"filter":"raw"},{"term_id":1453,"name":"dataset anonymization","slug":"dataset-anonymization","term_group":0,"term_taxonomy_id":1451,"taxonomy":"post_tag","description":"","parent":0,"count":1,"filter":"raw"},{"term_id":1451,"name":"network traffic dataset","slug":"network-traffic-dataset","term_group":0,"term_taxonomy_id":1449,"taxonomy":"post_tag","description":"","parent":0,"count":1,"filter":"raw"}],"_links":{"self":[{"href":"https:\/\/nil.uniza.sk\/en\/wp-json\/wp\/v2\/posts\/5442","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nil.uniza.sk\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nil.uniza.sk\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nil.uniza.sk\/en\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/nil.uniza.sk\/en\/wp-json\/wp\/v2\/comments?post=5442"}],"version-history":[{"count":0,"href":"https:\/\/nil.uniza.sk\/en\/wp-json\/wp\/v2\/posts\/5442\/revisions"}],"wp:attachment":[{"href":"https:\/\/nil.uniza.sk\/en\/wp-json\/wp\/v2\/media?parent=5442"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nil.uniza.sk\/en\/wp-json\/wp\/v2\/categories?post=5442"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nil.uniza.sk\/en\/wp-json\/wp\/v2\/tags?post=5442"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}