mit dem Stata-Programm cfout
Zwei mögliche Szenarien:
Zwei Grundvoraussetzungen müssen erfüllt sein:
persnr)soep_v1.dat & soep_v2.datcfout.dodataresultsdatagen.doOrdnerstruktur:
. ├── cfout.do ├── data │ ├── results │ │ ├── diffs.dta │ │ ├── diffs1.dta │ │ ├── ... │ │ └── diffs6.dta │ ├── soep_master.dta │ ├── soep_v1.dta │ ├── soep_v2.dta │ └── soep_v3.dta └── datagen.do
Über den Stata-Paketmanager:
. ssc install cfout
. cd "/pfad/zum/arbeitsverzeichnis/data"
. use soep_master.dta, clear
. use soep_master.dta, clear
. cfout state-mar using ///
soep_v1, id(persnr)
--------------------------------
Number of differences: 17599
Number of values compared: 21644
Percent differences: 81.311%
--------------------------------
bei fehlenden Variablen und Beobachtungen
. use soep_master.dta, clear
. cfout hhnr2009-xweights using ///
soep_v2, id(persnr)
note: the following variables are not in the using data: yedu eqpter
note: the following observations are only in the master data:
+---------+
| persnr |
|---------|
| 409601 |
…
| 1310001 |
+---------+
---------------------------------
Number of differences: 5319
Number of values compared: 333932
Percent differences: 1.593%
---------------------------------
. use soep_master.dta, clear
. cfout wor01-wor12 using soep_v1, ///
id(persnr) ///
saving(results/diffs, replace)
--------------------------------
Number of differences: 52522
Number of values compared: 64932
Percent differences: 80.888%
--------------------------------
. use results/diffs.dta
. browse
mit eigenen Variablennamen
. use soep_master.dta, clear
. cfout hhnr2009-xweights using soep_v2, ///
id(persnr) ///
saving( ///
results/diffs1, ///
variable(varname) ///
masterval(soep_master) ///
usingval(soep_v2) ///
replace ///
)
--------------------------------
Number of differences: 52522
Number of values compared: 64932
Percent differences: 80.888%
--------------------------------
. use results/diffs.dta
. browse
mit eigenen Variablennamen
. use soep_master.dta, clear
. cfout hhnr2009-xweights using soep_v2, ///
id(persnr) ///
saving( ///
results/diffs2, ///
all replace ///
)
---------------------------------
Number of differences: 5319
Number of values compared: 333932
Percent differences: 1.593%
---------------------------------
. use results/diffs2.dta
. count if diff
5,319
aus dem Master- oder Using Dataset
. use soep_master.dta, clear
. cfout hhnr2009-xweights using soep_v2, ///
id(persnr) ///
saving( ///
results/diffs3, ///
keepmaster(yedu) ///
replace ///
)
note: the following variables are not in the using data: yedu eqpter
note: the following observations are only in the master data:
---------------------------------
Number of differences: 5319
Number of values compared: 333932
Percent differences: 1.593%
---------------------------------
note: not all observations were compared; there are observations only in the master data.
. use soep_master.dta, clear
. cfout hhnr2009-xweights using soep_v1, id(persnr) ///
saving( ///
results/diffs5, ///
properties(type) ///
replace ///
)
. use results/diffs5.dta
. browse
…in dem es Sinn ergeben könnte, die Speicherart zu erfassen und weiterzuverarbeiten
isInteger für Integer-Variablen erstellen
// Wir arbeiten im Ergebnis-Datensatz
. use results/diffs.dta, clear
// Dummy-Variable für Integer-Variablen
. gen isInteger = strmatch(type, "int")
// Arbeitsvariablen für Datenoperation:
// 1909 Jahre abziehen
. gen new_master = Master-1909
. gen new_using = Using-1909
// Ersetzen der Original-Variablen
. replace Master = new_master if isInteger
. replace Using = new_using if isInteger
// Hilfsvariablen inkl. Dummy-Variable löschen
. drop new_master new_using isInteger
// Datentyp von Using und Master auf Byte wechseln
. recast byte Master Using
// Und schon hat diffs.dta nur noch 1,07MB statt "monströsen" 1,20MB
// Herzlichen Glückwunsch, wir haben 130kB gespart.
// Das ist die 0,91-fache Kapazität einer 5,25" floppy disk
// (Apple Disk II DOS 3.3 war 140 KiB groß)
// ((1983 war das das modernste Disketten-Betriebssystem auf dem Markt))
https://slides.hutt.io/cfout.html